This review covers the structures and genetics of the 46 O antigens of Salmonella, a major pathogen of humans and domestic animals. The variation in structures underpins the serological specificity of the 46 recognized serogroups. The O antigen is important for the full function and virulence of many bacteria, and the considerable diversity of O antigens can confer selective advantage. Salmonella O antigens can be divided into two major groups: those which have N-acetylglucosamine (GlcNAc) or N-acetylgalactosamine (GalNAc) and those which have galactose (Gal) as the first sugar in the O unit. In recent years, we have determined 21 chemical structures and sequenced 28 gene clusters for GlcNAc-/GalNAc-initiated O antigens, thus completing the structure and DNA sequence data for the 46 Salmonella O antigens. The structures and gene clusters of the GlcNAc-/GalNAc-initiated O antigens were found to be highly diverse, and 24 of them were found to be identical or closely related to Escherichia coli O antigens. Sequence comparisons indicate that all or most of the shared gene clusters were probably present in the common ancestor, although alternative explanations are also possible. In contrast, the better-known eight Gal-initiated O antigens are closely related both in structures and gene cluster sequences.
O antigen (O polysaccharide) is a part of the lipopolysaccharide (LPS) component of the outer membrane of Gram-negative bacteria and is one of the most variable cell constituents. It consists of oligosaccharide repeats (O units), normally containing two to eight sugar residues. The variation is mostly in the types of sugars present, their order in the structure, and the linkages between them. The O antigen is subject to intense selection by the host immune system, bacteriophages, and other environmental factors (Reeves & Wang, 2002), which may account for the maintenance of diverse O-antigen forms within a species. O-antigen diversity is a common basis for bacterial serotyping and also important for the bacteria, as it allows each of the various clones to present a surface that offers selective advantage in its specific niche (Reeves, 1992). The presence of O antigen is also essential for survival of bacteria in their natural environment and plays a role in bacterial virulence. There is direct evidence that the loss of O antigen makes many pathogens, such as Escherichia coli, Shigella flexneri, Francisella tularensis, and Yersinia enterocolitica, serum sensitive or otherwise seriously impaired in virulence (Pluschke et al., 1983; Bengoechea et al., 2004; West et al., 2005; Plainvert et al., 2007; Raynaud et al., 2007).
Salmonella is recognized as a major pathogen of both animals and humans and is the cause of typhoid fever, paratyphoid fever, and the foodborne illness salmonellosis. Salmonella infections arise from contamination of poultry, eggs, beef, and other foods, sometimes including unwashed fruits and vegetables. In many countries, Salmonella is the leading cause of foodborne outbreaks and infections. It is estimated that there are 1.3 million cases of salmonellosis, 15 000 hospitalizations, and 400 deaths annually in the United States (Hardnett et al., 2004). The genus Salmonella includes two species, S. enterica and S. bongori. S. enterica is divided into the following six subspecies: S. enterica enterica, S. enterica salamae, S. enterica arizonae, S. enterica diarizonae, S. enterica houtenae, and S. enterica indica or subspecies I, II, IIIa, IIIb, IV, and VI, respectively. S. bongori was originally designated S. enterica subspecies V, but it has since been determined to be a separate species. This classification has been confirmed by multilocus enzyme electrophoresis and sequence analysis of housekeeping genes (Nelson et al., 1991; Nelson & Selander, 1992; Boyd et al., 1994; McQuiston et al., 2008).
Serotyping is highly useful for identifying strains that vary in host range and disease spectrum, including pathogens such as Salmonella, and is invaluable for epidemiological investigations. The Kauffmann–White–Le Minor serotyping scheme for designation of Salmonella serotypes, maintained by the WHO Collaborating Centre for Reference and Research on Salmonella, is used by most laboratories for the characterization of Salmonella isolates. A serotype of Salmonella is determined on the basis of O and flagellar (H) antigens. The O antigen determines the serogroup, while the H antigen completes the definition of the serovar or serotype of a Salmonella isolate.
There are 46 O serogroups described in the Kauffmann–White–Le Minor scheme. These were originally designated by letters of the alphabet, but later, it was necessary to continue with numbers 51–67. The genes specific for O-antigen synthesis are normally present as a gene cluster in the chromosome, which maps between galF and gnd in Salmonella, E. coli, and Shigella, but sometimes, one or more such genes map outside the gene cluster. There are 114 H antigens in Salmonella (McQuiston et al., 2004), and 2557 serovars in total have been recognized (Grimont & Weill, 2007). Approximately 60% of the serovars belong to subspecies I, while subspecies VI and S. bongori are rare. O-antigen gene clusters appear to have been transferred among subspecies, as the majority of Salmonella O antigens are found in at least two subspecies with a mean of 3.5 subspecies per O antigen (Reeves, 1995; Popoff & Le Minor, 1997).
Genetic variation in the O-antigen gene cluster is the major determinant of differences among the diverse O-antigen forms. O-antigen synthesis genes fall into three main classes: (1) nucleotide sugar precursor synthesis genes for sugars specific to the O antigen. Note that the common sugars in the O antigen that are also found in other polysaccharide structures or are involved in metabolism, such as glucose (Glc), galactose (Gal), and N-acetylglucosamine (GlcNAc), are usually synthesized by genes outside the O-antigen gene cluster. (2) sugar transferase genes associated with the O-unit assembly that are specific for the donor and acceptor sugars and generate a specific linkage between them; and (3) genes for O-unit processing and the conversion of the O unit to O antigen (wzx and wzy in the Wzx/Wzy pathway and wzm and wzt in the ABC transporter pathway). However, genes on bacteriophages or other chromosomally encoded genes, which are not located in the O-antigen gene cluster, are often involved in modification of the structure and particularly in the addition of side-chain residues to the O units.
The synthesis and translocation of the O antigen can occur through three distinct pathways: the Wzx/Wzy pathway, the ATP-binding cassette (ABC) transporter pathway, and the synthase pathway (Bronner et al., 1994; Keenleyside & Whitefield, 1996; Daniels et al., 1998; Linton & Higgins, 1998; Samuel & Reeves, 2003). In the Wzx/Wzy pathway, the O unit is synthesized by sequential transfer of a sugar phosphate and one or more sugars from the respective nucleotide sugars to the carrier lipid, namely undecaprenyl phosphate (UndP). O units are flipped across the cytoplasmic membrane and then polymerized to form polysaccharide chains, which are transferred to the independently synthesized core-lipid A component to form LPS (Mulford & Osborn, 1983; McGrath & Osborn, 1991; Reeves & Wang, 2002). In the ABC transporter pathway, the glycosyltransferases mediate the sequential addition of sugar residues to the nonreducing end of the growing polymer to form the complete O-antigen polymer that is attached to UndPP. The polysaccharide is then translocated across the cytoplasmic membrane by an ABC transporter and ligated to the core-lipid A to form the complete LPS (Bronner et al., 1994; Linton & Higgins, 1998). In the synthase pathway used for synthesis of the Salmonella O54 antigen, a synthase catalyzes the extension of the polysaccharide chain with simultaneous extrusion of the nascent polymer across the cytoplasmic membrane (Keenleyside & Whitefield, 1996).
Most Salmonella O antigens (39 in total including Salmonella O54 and O67 and taking into account that Salmonella O28 is divided into O28ab and O28ac) have either GlcNAc or N-acetylgalactosamine (GalNAc) as the first sugar of the O unit. As in most E. coli and Shigella strains WecA, which is encoded by a gene in the enterobacterial common antigen (ECA) gene cluster, is responsible for initiating the synthesis of GlcNAc- and GalNAc-initiated O antigens by transferring GlcNAc-1-phosphate to the UndP carrier. When GalNAc is the initiating sugar, UndPP–GlcNAc is then converted to UndPP–GalNAc by an epimerase, which is encoded by a gene that has been called gne (Rush et al., 2010). However, we suggest that this gene be renamed gnu, as its product is specific for UndPP–GlcNAc, whereas epimerases encoded by gne are specific for UDP–GlcNAc (Cunneen et al., 2013).
Salmonella O67 occurs rarely and has been suggested to be a variant of serogroup O4 (B) O antigen (Li & Reeves, 2000). However, in this study, we found that the O-antigen structure of Salmonella O67 is similar to that of d-galactan I O antigen in Klebsiella pneumonia and that its gene cluster is not located between galF and gnd.
Salmonella O54 has a disaccharide O unit composed of two ManNAc residues. The O54 antigen gene cluster is on a plasmid, and the O antigen expressed from the main O-antigen gene cluster is present together with the O54 antigen (Keenleyside & Whitefield, 1996). The O54 serogroup is currently retained, but if the plasmid is lost, factor O54 is no longer expressed.
The Salmonella O antigens belonging to serogroups O2 (A), O4 (B), O8 (C2–C3), O9 (D1), O9,46 (D2), O9,46,27 (D3), O3,10 (E1–E3), and O1,3,19 (E4) form a distinct set that is characterized by having a Gal residue as the first sugar of the O unit and a wbaP gene in the O-antigen gene cluster, which encodes the glycosyltransferase that catalyzes the addition of the Gal-1-phosphate residue to UndP to initiate O-unit synthesis. These serogroups have related O-antigen structures and gene clusters (Reeves et al., 2013). Details of their relationships show that they have a complex evolutionary history that will be reviewed separately (Reeves et al., 2013).
Although GlcNAc-/GalNAc-initiated O antigens outnumber Gal-initiated O antigens in Salmonella (39 vs. 8), the latter were found to be more prevalent in Salmonella isolates. Among Salmonella isolates from human sources reported between 1999 and 2009 by the Centers for Disease Control and Prevention in the United State, 84.23% isolates belonged to serogroups with a Gal-initiated O antigen, and only 5.35% isolates belonged to serogroups with a GlcNAc-/GalNAc-initiated O antigen (other isolates could not be serotyped) (CDC, 2009).
Systematically analyzing the chemical structures and gene clusters of different O-antigen forms in a genus or species will improve our understanding of the generation of the O-antigen diversity. It will also open the way for experimental studies on the relationship between this diversity and pathogenicity. Many laboratories in the world have worked on the structure, genetics, and function of O antigens. However, most of these studies have focused on relatively few O-antigen forms.
In a previous review, we summarized the structures and gene clusters of all Shigella O antigens (Liu et al., 2008) and found many genetic anomalies in the gene clusters. It was suggested that the Shigella set of O antigens has been assembled relatively recently or undergone adaptive modifications in a newly occupied niche. Salmonella, Shigella, and E. coli are known to be evolutionarily related (Ochman & Wilson, 1987), and we also presented evidence in support of the close relationship between Shigella and E. coli, as 21 of 34 Shigella O antigens are either identical or closely related to an E. coli O antigen. Homologous recombination was shown to be an essential mechanism in the diversification of Shigella O antigens.
Shigella is a pathogenic form that was estimated to have developed within E. coli several times over the last 35 000–270 000 years (Pupo et al., 2000), but these events were probably more recent as mutation rates in bacterial clones observed in recent studies are much higher than earlier estimates, and this affects the date estimates (Feng et al., 2008; Ho et al., 2011; Morelli et al., 2011; Reeves et al., 2011). In contrast, Salmonella is a distinct genus with a much longer history (Ochman & Wilson, 1987; Doolittle et al., 1996); it is thought that E. coli and Salmonella diverged from a common ancestor about 140 million years ago. The evolutionary mechanisms for generation of the O-antigen diversity in Salmonella are expected to be different from those in Shigella. When we started a systematic study of Salmonella and E. coli O antigens, there were only four cases in which the O-antigen structure had been shown to be identical in the two species (Rundlof et al., 1998; Samuel et al., 2004), although more serological cross-reactions had been observed (Orskov et al., 1977). It was suggested that there had been extensive replacement of O antigens, presumably by lateral gene transfer, since divergence of the two species.
In the past 5 years, we have sequenced 28 Salmonella O-antigen gene clusters, 14 of which are reported here for the first time, and determined 21 Salmonella O-antigen chemical structures, five of which are reported here for the first time. We have also revised the chemical structures of another three Salmonella O antigens. In this study, we present a compilation of the published and new chemical structures and DNA sequence data for the 46 known Salmonella O antigens. Together with the summary of Shigella O antigens, it gives an improved insight into the evolution of O-antigen diversity in bacteria. The structures and gene clusters of GlcNAc-/GalNAc-initiated O antigens were found to be highly diverse. However, the proportion of genetic anomalies in these gene clusters is clearly lower than that in Shigella, indicating that these O antigens are more stable. We also sequenced 18 E. coli O-antigen gene clusters and determined 9 and revised 2 E. coli O-antigen chemical structures to obtain sufficient data for a comparison of all O antigens shared by Salmonella and E. coli (the others were retrieved from databases). We found that 24 Salmonella O-antigen forms are either identical or closely related to E. coli O antigens, as indicated by both genetic and structural data. Therefore, the relationship between E. coli and Salmonella O antigens is much closer than previously thought. The genetic data imply that almost all O antigens shared by Salmonella and E. coli originated from an O antigen in their common ancestor, although alternative explanations (such as a recent lateral transfer of a gene cluster from one species to the other) are also possible. In contrast to Salmonella GlcNAc-/GalNAc-initiated O antigens, Salmonella Gal-initiated O antigens exhibit a high level of relatedness in structure and genetic aspects, implying a distinct evolutionary history.
Chemical composition and structures for Salmonella GlcNAc-/GalNAc-initiated O antigens
The structures of all GlcNAc-/GalNAc-initiated Salmonella O antigens are now known (Table 1). Some structures elucidated by us only recently have not been reported earlier and are presented here for the first time. They were established using one- and two-dimensional 1H- and 13C-NMR spectroscopy essentially as described (Duus et al., 2000).Three new Salmonella structures, those of O42, O52, and O65 antigens, are identical to the known structures of E. coli O1B (Gupta et al., 1992), O153 (Ratnayake et al., 1994), and O78 (Jansson et al., 1987), respectively.
Table 1. Structures of Salmonella GlcNAc-/GalNAc-initiated O antigens, including O54 and O67, and related Escherichia coli O antigens
The O-antigen structure of Salmonella O67 was found to be highly similar to that of d-galactan I (→3)-d-Galf-(β1→3)-α-d-Galp-(α1→) in K. pneumoniae (Whitfield et al., 1991). The only difference between the two O antigens is the presence of an O-acetyl group in Salmonella O67. Its position was determined by a comparison of the NMR spectra of the initial and O-deacetylated polysaccharides, which revealed characteristic displacements of 1H- and 13C NMR signals caused by a deshielding effect of the O-acetyl group.
Using 13C-NMR spectroscopy and the ‘fingerprint’ method, it was found that the O antigen of Salmonella O21 has the same structure as that reported erroneously for S. enterica arizonae O64 and Citrobacter freundii O32 (Kocharova et al., 1988). Formerly, a wrong structure has been assigned to the S. enterica arizonae O21 O antigen (Vinogradov et al., 1994), which, in fact, may belong to Citrobacter braakii O37 (A. Gamian, pers. commun.).
In addition, structures of two Salmonella O antigens were revised in this work. Using known regularities in the 13C-NMR chemical shifts of the Quip3NAc-(α1→3)-d-Manp disaccharide (Shashkov et al., 1988), the absolute configuration of Qui3NAc in the O39 antigen was revised from l (Gajdus et al., 2009) to d. In the O62 antigen, d-GalNAcA is present in the amide form (d-GalNAcAN) rather than as the free acid (Vinogradov et al., 1992). This was demonstrated by the 1H-NMR spectrum of a polysaccharide sample measured in a 9:1 H2O/D2O mixture, which showed two signals for NH2 protons at 7.40 and 7.65 ppm [compare published data (Rundlof et al., 1998)]. We have also revised the N-acyl group on l-FucN in the O48 antigen from N-acetyl to N-acetimidoyl (Feng et al., 2005b).
Except for the O54 and O67 antigens, all Salmonella GlcNAc-/GalNAc-initiated O antigens are heteropolysaccharides. Some are linear and have tri- to pentasaccharide O units. Others are branched with tetra- to hexasaccharide O units usually including one or two monosaccharide side chains or, less often, a disaccharide side chain. Most of the sugars are in the pyranose form, whereas d-Gal occurs in the furanose form in two O antigens, and d-Rib exists in the furanose form in all cases.
In addition to d-GlcNAc and d-GalNAc, hexoses d-Glc, d-Man, d-Gal, l-Rha, and l-Fuc occur in six or more O antigens each (Supporting Information, Table S1). When d-Glc is present as a side chain, its content is often less than stoichiometric, and there is no putative glycosyltransferase for its transfer in the gene clusters, both indicating that this sugar is incorporated into the O antigen after assembling and processing of the O unit. Exceptionally, a side-chain Glc was proposed to be transferred by a glycosyltransferase encoded in the Salmonella O66 gene cluster (Liu et al., 2010a).
Other monosaccharides are components in 1–4 O antigens each. These include neutral sugars (d-Rib, Col) and various uncommon amino sugars (d-ManN, l-QuiN, l-FucN, d-Qui3N, d-Fuc3N, d-Qui4N, d-Rha4N). In most cases, the amino sugars are N-acetylated, but in some O antigens, they carry rarely occurring N-acyl groups, such as N-acetimidoyl on l-FucN, N-formyl on d-Fuc3N, (R)-3-hydroxybutanoyl on 8-epilegionaminic acid (8eLeg), and N-acetyl-l-seryl or N-[(S)-3-hydroxybutanoyl]-d-alanyl on Qui4N (Table S1). O-Acetylation is not uncommon in Salmonella O antigens, and one or two O-acetyl groups are present in nonstoichiometric quantities in the O units of 7 O serogroups (Table 1).
In contrast to the O antigens of Shigella (Liu et al., 2008), the O antigens of Salmonella are typically neutral polysaccharides. Only a few of them contain acidic components, such as hexuronic acids (d-GlcA and d-GalNAcA in the O45 and O62 antigens, respectively), nonulosonic acids (derivatives of Neu and 8eLeg in the O48 and O61 antigens, respectively), and ribitol phosphate (in the O47 antigen). However, GalNAcA exists in the neutral amide form, and the negative charge of both nonulosonic acids and phosphate group is neutralized by a basic N-acetimidoyl group on a l-FucN residue.
8eLeg5RHb7Ac (7-acetamido-3,5,7,9-tetradeoxy-5-[(R)-3-hydroxybutanoylamino]-l-glycero-d-galacto-non-2-ulosonic acid, a derivative of 8-epilegionaminic acid) is a higher sugar rarely occurring in nature and is similar to isomeric nonulosonic acids found in some other bacterial carbohydrates, di-N-acyl derivatives of Pse (5,7-diamino-3,5,7,9-tetradeoxy-l-glycero-l-manno-non-2-ulosonic or pseudaminic acid), and Leg (5,7-diamino-3,5,7,9-tetradeoxy-d-glycero-d-galacto-non-2-ulosonic or legionaminic acid) (Knirel et al., 2003, 2012).
There are only two pairs of Salmonella O serogroups with closely related GlcNAc-/GalNAc-initiated O antigens. The O13 and O43 antigens differ only in (1) the configuration (α vs. β) and the position (1→2 vs. 1→4) of the polymerization linkage between the O units; and (2) the presence of a Gal side chain in the O43 antigen. The O6, 14 and O18 antigens, which share O factors 6 and 14, differ only in (1) the initiating amino sugar (GlcNAc vs. GalNAc); and (2) the polymerization linkage (1→6 vs. 1→4). Within serogroups, nonglucosylated and glucosylated structural variants are known, for example, compare the Salmonella O6,7, O6,14, and O30 antigens (Table 1). Finally, two structures have been reported for the Salmonella O50 antigen, which differ only in the initiating amino sugar (GlcNAc vs. GalNAc). In contrast, the similarity of the O antigens of Salmonella O28ab and O28ac is limited to the presence of the common d-Galp-(β1→3)-d-GalpNAc-(α1→4)-d-Quip3NAc trisaccharide fragment in the main chain, and classification of the two bacteria to the same serogroup requires reconsideration.
Remarkably, many Salmonella GlcNAc-/GalNAc-initiated O antigens are closely related or even identical to E. coli O antigens (Tables 1 and 2). Most O-antigen structures shared by these bacteria have been reported by us or others earlier, and some of them are discussed below.
Table 2. Summary of Salmonella and Escherichia coli sharing the identical or closely related O-antigen structures and gene clustersa
Salmonella O antigen
E. coli O antigen
Reference for sequences (Salmonella/E. coli)
S: Salmonella; E: E. coli; =: identical; Cr: closely related.
The backbone of Salmonella O18 antigen is found to be identical to that of strain E. coli 73-1(Weintraub et al., 1993). However, the O-serogroup and O-antigen gene cluster of E. coli 73-1 are unknown.
Salmonella O52 shares the same O-antigen structure with E. coli O153 (Ratnayake et al., 1994). However, their gene clusters are unrelated.
General features of gene clusters for Salmonella GlcNAc-/GalNAc-initiated O antigens
The gene clusters of all GlcNAc-/GalNAc-initiated Salmonella O antigens have been sequenced (Fig. 1, Table S2). Except for Salmonella O54 and O67, these gene clusters are localized in the genomes between the galF and gnd genes. Their general characteristics, such as having low GC content (about 30%), using wzx/wzy as O-unit processing genes, exhibiting great diversity, etc., are similar to those of E. coli and Shigella.
Almost all Salmonella O antigens use the Wzx-/Wzy-dependent process for the synthesis and translocation of O antigens, the only exceptions being O54 and O67, which use the synthase-dependent pathway and the ABC transporter pathway, respectively. The wzx and wzy genes are usually located within the O-antigen gene cluster, but for O66, there is no wzy gene in the gene cluster, and it must be located elsewhere in the genome. This resembles the situation in serogroups A, B, and D1, which have the wzy gene at a locus far from the main gene cluster (Naide et al., 1965; Curd et al., 1998). In E. coli, the ABC transporter pathway has been reported for 10 of the 148 O antigens with sequenced gene clusters (O8, O9, O20, O52, O89, O95, O97, O99, O101, O162) (Liu et al., 2008), but is not found to mediate the synthesis of any Salmonella or Shigella O antigens except for Salmonella O67.
Several studies have shown that most gene clusters for Salmonella Gal-initiated O antigens have a cassette structure with a central set of variable serogroup-specific genes flanked by highly homologous sugar pathway genes or other shared genes. A similar situation has been found in several groups of Streptococcus pneumoniae gene clusters for capsules with related structures (Bentley et al., 2006; Mavroidi et al., 2007) and in Yersinia pseudotuberculosis O-antigen gene clusters (Cunneen et al., 2009; De Castro et al., 2009, 2010). In contrast, the gene clusters for Salmonella GlcNAc-/GalNAc-initiated O antigens are highly diverse and possess no cassette structure. There are only three sets of related O-antigen gene clusters.
Salmonella O11 and C1. The last three genes (manC, manB, and wzx) at the 3′ end of their gene clusters are in the same order and share obvious DNA identity (63% for manC, 93% for manB, and 97% for wzx; Fig. 2a). The O-antigen structures of Salmonella O11 and C1 are not related except for having mannose as a constituent sugar, and the other genes of their gene clusters are quite different. It is likely that a recombination event has occurred between the O-antigen gene clusters of Salmonella O11 and C1. The DNA identity level of manC is much lower than that of the manB and wzx genes, and we propose that one of the recombination sites is located in the manB gene. It is surprising that an almost identical Wzx protein is responsible for translocation of O antigens with such different structures.
Salmonella O13 and O43. The first seven genes and last two genes of the Salmonella O13 and O43 antigen gene clusters have the same order and significant DNA identity, and the structures are also related (Fig. 2b). Both structures are also found in E. coli, and the relationships between the four structures and gene clusters are discussed below.
Salmonella O47, O48, and O61. The last eight genes in the O-antigen gene clusters of the three O serogroups have the same order and share 63–100% DNA identity (Fig. 2c). All three structures contain the l-FucpNAm-(α1→3)-d-GlcpNAc disaccharide fragment. Four of the eight genes, fnlA, fnlB, fnlC, and wbuX, are involved in the synthesis of l-FucpNAm, and wbuB is proposed to be the l-FucNAm transferase gene. The role of the other three genes is not clear as there are no other shared structural elements.
Although the O-antigen structures of Salmonella O6,14 and O18 are identical apart from the Wzy polymerization linkage (Table 1), the genes in their gene clusters share no similarity, except for manC (59% identity) and wzy (49% identity). It is interesting that the wzy genes are among those with higher levels of identity given the different polymerization linkage.
The sugar synthesis genes in O-antigen gene clusters, such as those for l-Rha and d-Man, are often highly conserved and easily identified.
Among the Salmonella GlcNAc-/GalNAc-initiated O antigens discussed in this section, l-Rha is present in 7 O antigens. RmlB, RmlD, RmlA, and RmlC catalyze the four-step synthesis of dTDP-l-Rha, and the genes are usually located at the 5′ end of the O-antigen gene clusters of Salmonella and E. coli with the above conserved gene order. The sequence comparisons show that in Salmonella, the 5′ end of the rml gene set, comprising rmlB, rmlD, and most of rmlA, has many characteristics of housekeeping genes and is in general subspecies specific (data not shown). In contrast, the 3′ end, including part of rmlA and all of rmlC, is much more variable, and the variation at this end is clearly O antigen- and not subspecies-related. This is consistent with a previous report (Li & Reeves, 2000) based on a much smaller number of serotypes. It was suggested in that study that this was because rmlC and the 3′ end of rmlA are commonly transferred between subspecies with the glycosyltransferase and O-antigen processing genes that determine O-antigen specificity and are generally in the central region of the gene cluster. The 5′ end of the rml gene set was proposed to gain its subspecies-specific sequence in this process, as these genes remain in the species as new gene clusters arrive and others die out. The additional data fully support those conclusions.
Where the rmlB and rmlA genes are involved in the synthesis of sugars other than Rha, the full gene set may also be located at the 5′ end with rmlB and rmlA as the first two genes, as in the O-antigen gene clusters of Salmonella O28ab, O39, O55, O58, and O60. Only in Salmonella O56 and O63 are rmlB and rmlA found elsewhere in the gene cluster (Fig. 1).
d-Man is present in 10 GlcNAc-/GalNAc-initiated Salmonella O antigens. GDP-d-Man is synthesized from fructose-6-phosphate by ManA, ManB, and ManC, but only the manB and manC genes are generally present in the gene cluster as ManA is also involved in use of exogenous mannose as a carbon source, and the gene is not associated with the O-antigen gene clusters (Neidhardt et al., 1987). ManB and ManC are also involved in the synthesis of GDP-Col, GDP-l-Fuc, GDP-d-Rha4NAc (GDP-PerNAc), so a total of 16 gene clusters for GlcNAc-/GalNAc-initiated Salmonella O antigens contain manB and manC genes. Colanic acid (CA), which is widely present in Salmonella, contains l-Fuc, and the manB and manC genes required for production of GDP-Fuc are located within the CA gene cluster (Aoyama et al., 1994). The CA gene cluster is unusual in having generally a high GC content, and remarkably, most manB genes for GlcNAc-/GalNAc-initiated Salmonella O antigens (including those in O60 and O65 that are reported in this review) share high level identity (93–99%) to the CA manB gene (Jensen & Reeves, 2001). The only exceptions are manB genes of O6,14 and O35. Furthermore, those CA-like manB genes display obvious subspecies specificity, and the CA manB genes and the CA-like manB genes in each strain appear to be evolving in concert via gene conversion events (Jensen & Reeves, 2001). These events appear to be unidirectional, as no manB gene with low GC content has been found in a CA gene cluster. It should be noted that the manB genes from the O-antigen gene clusters of the 8 Gal-initiated serogroups are closely related and not CA-like (Jensen & Reeves, 2001). In contrast to manB, with the exception of O11 and O41, the Salmonella manC genes are not CA-like even in gene clusters with the whole of the l-Fuc pathway.
To assess the diversity of Wzx, Wzy, and the glycosyltransferases involved in the synthesis of the 37 Wzx/Wzy pathway GlcNAc-/GalNAc-initiated O antigens, we used the TribeMCL program (Enright et al., 2002) with a cutoff of 1e−50 to assemble each group of proteins into homology groups (HG). 36 Wzy proteins (the Salmonella O66 gene cluster contains no wzy gene) and 37 Wzx proteins were assembled into 35 and 23 HG, respectively. There is enormous diversity as the average amino acid identity levels between the Wzy or Wzx HG are under 15%.
In contrast, Wzy and Wzx proteins for the 8 Gal-initiated Salmonella O antigens were assembled into 4 and 3 HG, respectively, with mostly similar low levels of identity between HG as found for Salmonella GlcNAc-/GalNAc-initiated O antigens. However, the higher proportion of gene clusters with a shared HG for Wzx or Wzy reflects a higher level of relatedness among gene clusters for Gal-initiated O antigens. The data also further demonstrate the different patterns of diversity in the gene clusters for Salmonella GlcNAc-/GalNAc- and Gal-initiated O-antigen gene clusters.
The 127 glycosyltransferases from the 37 Wzx/Wzy pathway O antigens were assembled into 91 HG (Table S3), of which 20 contain 2–6 members. The functions of 64 of these glycosyltransferases can be predicted based on correlations between the presence of a glycosyltransferase with a specific protein sequence and a shared or similar structural element in the corresponding O antigens (Fig. S1).
In some cases, glycosyltransferases belonging to the same HG were proposed to have the same function. For instance, the 6 glycosyltransfeases in HG-GT-1 share 41–99% identity in pairwise comparisons. Among these, WfbG in Salmonella O43 was proposed to be responsible for the synthesis of a d-GalNAc-(α1→3)-d-GlcNAc linkage. When structural data were taken into consideration, 5 of the 6 HG-GT-1 glycosyltransferases were proposed to have the same function and named WfbG accordingly. The only exception is WbdH in Salmonella O35, which is proposed to be responsible for the formation of a d-Gal-(α1→3)-d-GlcNAc linkage.
Low proportion of anomalies in gene clusters for Salmonella GlcNAc-/GalNAc-initiated O antigens
Anomalies in the O-antigen gene clusters usually indicate a recent genetic event that may have been involved in the formation of the O-antigen form, perhaps related to adaptive modifications of bacteria in newly occupied niches (Liu et al., 2008). Twelve such anomalies belonging to five categories (mobile elements, noncoding region, gene(s) in the reverse orientation or unusual location, and gene remnant) are found in the 37 Salmonella GlcNAc-/GalNAc-initiated O-antigen gene clusters. Previous studies found 17 such anomalies in the 33 Shigella O-antigen gene clusters, and 49 anomalies present in 148 E. coli O-antigen gene clusters. The proportion of anomalies in Salmonella O-antigen gene clusters is very similar to that in E. coli O-antigen gene clusters and much lower than that in Shigella. This suggests that it is Shigella that is atypical, which is consistent with it having diverged relatively recently and adopting a new niche.
Several insertion sequences and H-repeat elements were found in Shigella strains and were often associated with inferred gene cluster rearrangements. However, for the gene clusters of Salmonella GlcNAc-/GalNAc-initiated O antigens, only one mobile element is found, an H-repeat insertion that is 78% identical to the RhsB H-repeat of E. coli K12 (Zhao et al., 1993), and is located between gne and gnd in the O-antigen gene cluster of Salmonella O51. This is the only major difference between the O-antigen gene clusters of Salmonella O51 and E. coli O23 that encode the same O antigen (Perepelov et al., 2011c), indicating that this H-repeat unit inserted into the O-antigen gene cluster of Salmonella O51 after the divergence of Salmonella and E. coli. Because the H-repeat unit in Salmonella O51 is intact, it is likely that the insertion occurred recently.
The gaps between genes in O-antigen gene clusters are often very short, suggesting that translational coupling is occurring, but larger gaps can arise during restructure of a gene cluster (for instance, the incorporation or deletion of genes). In Salmonella serogroups A, B, and D1, for example, the functional wzy genes responsible for the polymerization of O units are found outside the O-antigen gene cluster (Naide et al., 1965; Curd et al., 1998), and a remnant wzy gene is present in the large gap upstream of the wbaO gene where the wzy gene is found in groups E and D2. Noncoding regions also are found in gene clusters for four of the Salmonella GlcNAc-/GalNAc-initiated O antigens.
In the O-antigen gene cluster of Salmonella O66, there is no wzy gene (Fig. 1), and there is also a 874-bp noncoding region between weiA and weiB in the gene cluster (Liu et al., 2010a). However, no remnant of a wzy gene can be found in this region by sequence homology search. A wzy remnant can be difficult to find by blast search because of the high divergence levels in wzy genes and the degradation of remnant sequences by deletions, which can fragment an open reading frame and/or change the reading frame. In Salmonella serogroups A, B, and D1 discussed above, the wzy remnants were not found until the ancestral wzy gene of group D3 was sequenced, which provided a closely related homologue. The Salmonella wzy genes are highly divergent, and if none are in the same HG as the lost O66 wzy gene, then a remnant may well not be detectable by blast but have to await sequencing of a near relative. Because the Salmonella O66 type strain can produce normal LPS, it is highly likely that it also has a functional wzy gene for its β 1→2 linkage outside the O-antigen gene cluster.
In the O-antigen gene cluster of Salmonella O40, there is a remnant gnu gene between gne and wzx (Fig. 1). Gnu is responsible for the formation of UndPP–GalNAc from UndPP–GlcNAc for GalNAc-initiated O antigens (Rush et al., 2010), and the remnant suggests that the ancestral gene cluster coded for a GalNAc-initiated O antigen, although in E. coli and Salmonella, there is often a gnu gene upstream of galF rather than in the gene cluster. There is a 671-bp noncoding region between the gnu remnant and wzx, with no good hits in a blast search. Salmonella O40 has two GalNAc residues and no main-chain GlcNAc residue, indicating that both gne and gnu are required for the O-antigen synthesis. We suggest that there is a gnu gene upstream of galF in Salmonella O40, which is responsible for the synthesis of UndPP–GalNAc and replaces the function of the now degraded gnu gene in the gene cluster.
A 570-bp noncoding region with no good hits in a blast search is located between wekM and wzy in the Salmonella O42 antigen gene cluster. Salmonella O42 has the same O-antigen structure as E. coli O1 (Table 1), and their gene clusters also have the same organization. However, the 570-bp noncoding region is not found in the O-antigen gene cluster of E. coli O1. This noncoding region marks a boundary between different levels of DNA identity between the two gene clusters (Fig. 3a), being 55–81% for the seven upstream genes (rmlB-wekA), but only about 40% identity for the three downstream genes (wzy, wekN, wbdH) to the corresponding genes in E. coli O1, with no obvious protein identity for the gene products. The first seven genes in the two gene clusters presumably have a common ancestor, while the other three may have different origins. It is likely that the presence of the 570-bp noncoding region is related to the incorporation of wzy, wekN, and wbdH in Salmonella O42, suggesting that the ancestral gene cluster was like that in E. coli.
Two noncoding regions are found in the O-antigen gene cluster of Salmonella O53. One is located upstream (positions 9867–10961) of gne, and the other (positions 11991–13054), downstream of gne (upstream of gnd). It is likely that these two noncoding regions are related to the incorporation of the gne gene into the O-antigen gene cluster, implying an ancestor without the GalNAc residue currently present.
Genes in the reverse orientation
All but two of the genes in the Salmonella O-antigen gene clusters are transcribed from galF to gnd, the exceptions being qdtC in Salmonella O28ac and gne in Salmonella O21, which are transcribed in the opposite direction.
qdtC is located at the 3′ end of the Salmonella O28ac antigen gene cluster. QdtC is involved in the biosynthesis of dTDP-Qui3NAc, together with RmlA, RmlB, QdtA, and QdtB (Pfostl et al., 2008). QdtC is an acetyltransferase for the final step in synthesis of dTDP-Qui3NAc, and it is likely that qdtC was added to the O28ac gene cluster relatively recently and that the ancestor had Qui3N in place of Qui3NAc. The E. coli O71 antigen gene cluster has the same organization as that of Salmonella O28ac, including the orientation of the qdtC gene, and the main chains of the two polysaccharides are the same (Hu et al., 2010). Thus, unlike other anomalies discussed in this section, it is likely that the qdtC gene in Salmonella O28ac and E. coli O71 was present in the common ancestor and is not an indication of recent change in our definition.
The gne gene in the O-antigen gene cluster of Salmonella O21 is located between wdaK and wdaL and transcribed in the opposite direction. This is an unusual location for a gene with this orientation as all previously described genes transcribed in the opposite direction in the O-antigen gene clusters of Salmonella and Shigella are located after the 3′ end of the normally transcribed genes. The O21 gne gene is needed to synthesize the GalNAc residue, which is the last sugar in the structure, and would perhaps be replaced by a GlcNAc residue in its absence. The orientation of this gne gene creates a need for two additional promoters (one for wcaL), but there is no evidence to indicate that the gne gene is a recent addition, especially because a promoter upstream of gne could not be identified based on an in silico search.
rml genes in unusual location
As discussed above, the rmlB, rmlD, rmlA, and rmlC genes for the four-step synthesis of dTDP-l-Rha are usually located at the 5′ end of the E. coli and Salmonella O-antigen gene clusters, with the conserved gene order as above. In some cases, the rmlC gene is separated from other rml genes. Seven Salmonella GlcNAc-/GalNAc-initiated O antigens contain l-Rha, and two of the rmlC genes are found in unusual locations.
(1) and (2) In Salmonella O28ac and O53, rmlC was located 1 gene and 5 genes, respectively, downstream of the rmlBDA genes.
(3) In Salmonella O56, the rmlA and rmlB genes are involved in the synthesis of dTDP-Qui4N, and as there is no l-Rha moiety, the rmlC and rmlD genes are not required. However, while the O56 rmlB gene is at the 5′ end of the gene cluster, the rmlA gene is located at the other end of the gene cluster, 10 genes downstream of rmlB.
(4) In the O-antigen gene cluster of Salmonella O63, the rmlB and rmlA genes, which are involved in the synthesis of dTDP-d-Fuc3NAc, are not located at the 5′ end of the O-antigen gene cluster, but downstream of weiD.
The O antigen of Salmonella O50 contains d-Gal, d-GalNAc, d-GlcNAc, and Col. The synthesis of GDP-Col requires the products of 5 nonhousekeeping sugar synthesis genes: manB, manC, gmd, colA, and colB (Fig. 4). The manB, manC, and gmd genes are in the O-antigen gene cluster, while colA and colB are downstream of gnd, suggesting that they are a recent addition not fully incorporated into the gene cluster. There is a remnant of an fcl gene between gmd and gmm in the Salmonella O50 gene cluster, indicating that before the acquisition of colA and colB, the ancestral gene cluster coded for synthesis of GDP-l-Fuc. The colA and colB genes in E. coli O55, which has the same O antigen as Salmonella O50, are also located downstream of gnd. However, there is no fcl gene remnant in the O-antigen gene cluster of E. coli O55, presumably due to more extensive deletions than in Salmonella O50, which have occurred since the acquisition of the colA and colB genes. The presence of the colA and colB genes downstream of gnd and a remnant fcl gene in the gene cluster could suggest that colA and colB are a recent addition not fully incorporated into the gene cluster. However, the presence of the genes in the same location in E. coli suggests that in this case, it has survived for a long time and that consolidation of genes into the gene cluster can be a very slow process.
Biosynthetic pathways of monosaccharides
Twenty-one different sugars were found in the Salmonella GlcNAc-/GalNAc-initiated O antigens (Table S1). Fourteen of them are also present in Shigella O antigens, and their proposed or characterized biosynthetic pathways have been reviewed (Liu et al., 2008). The pathways for the other 7 sugars (Col, l-FucNAm, d-Qui3NAc, d-Fuc3NAc/d-FucNFo, d-Rha4NAc (d-PerNAc), Neu5Ac, and 8eLeg5RHb7Ac) and ribitol are shown in Fig. 4.
The biosynthetic pathway for 8eLeg5RHb7Ac, a component of the Salmonella O61 O antigen, is first proposed in this review (Fig. 4) and requires biochemical confirmation. A similar derivative of 8-epilegionaminic acid, 8eLeg5Ac7Ac, has been found in the O antigens of E. coli O61 and O108, and a biosynthetic pathway, including 7 enzymes (Elg1–Elg7), also was proposed (Perepelov et al., 2010c). Orf1–Orf6 and Orf8 in the Salmonella O61 antigen gene cluster share 51–84% identity to Elg1-Elg7, respectively, and may have the corresponding functions. Therefore, it is likely that the pathway for 8eLeg5RHb7Ac is similar to that proposed for 8eLeg5Ac7Ac, and orf1–orf6 and orf8 are responsible for the synthesis of 8eLeg5RHb7Ac in Salmonella O61 (Fig. 4). Based on the structural difference, we propose that the substrates of each gene in the two pathways have the different acyl groups at N5. The biosynthesis of 8eLeg5Ac7Ac is initiated from UDP-GlcNAc, and we propose that 8eLeg5RHb7Ac is initiated from UDP-GlcNRHb, which is probably synthesized from UDP-GlcNAc. However, the expected genes for UDP-GlcNRHb are not found in the O-antigen gene cluster of Salmonella O61 and may be located elsewhere in the genome. orf1-orf6 and orf8 in Salmonella O61 were named elb1–elb7, respectively.
A close relationship between the O antigens of Salmonella and E. coli
Until recently, there were only four confirmed cases in which the O antigens are identical in the two species: Salmonella O35 and E. coli O111, Salmonella O50 and E. coli O55, Salmonella O30 and E. coli O157, and Salmonella O62 and E. coli O35 (Rundlof et al., 1998; Samuel et al., 2004) (Table 1). We now find that there are 24 O antigens present in both Salmonella and E. coli being either identical or near identical between the two species, which is a much higher number than previously thought. All of the shared O antigens are GlcNAc-/GalNAc-initiated. The data are summarized in Table 2, and some interesting examples are described in detail below.
It is worth noting that in addition to Salmonella O30, O35, O50, and O62, there are 11 Salmonella O antigens that cross-react serologically with one or more E. coli O antigens (Orskov et al., 1977), and 7 of them (Salmonella O6,14, O11, O17, O38, O42, O43, and O51) were shown in this study to have structures and gene clusters that are identical or closely related to an E. coli O antigen. However, the remaining four Salmonella O antigens are not obviously related structurally or genetically to the respective E. coli O antigen.
Salmonella O52 was found to have the same O-antigen structure as that of E. coli O153 (Ratnayake et al., 1994). However, there are no genes shared by the two gene clusters, which is obviously different from other pairs of Salmonella and E. coli O antigens that are identical or closely related. The sources for the sequences of the 15 Salmonella GlcNAc-/GalNAc-initiated O antigens that are not related to E. coli O antigens are summarized in Table S4.
Salmonella O6,14 and E. coli O77 group
It is well known that most S. flexneri serotypes share a common O-antigen backbone and differ only in the distribution of four possible Glc side-branch residues and an O-acetyl moiety, which are all attached by enzymes encoded by prophage genes (Allison & Verma, 2000), or differ in the presence of a plasmid-encoded phosphoethanolamine modification (Sun et al., 2012; Knirel et al., 2013). There is a similar group of O-antigen structures in E. coli, comprising E. coli O77, O17, O44, O73, and O106, which have been given serogroup status, and Salmonella O6,14 is a single representative in Salmonella with a related structure (Wang et al., 2007). These strains also share a common four-sugar backbone O-unit structure and differ by the addition of one or two Glc side branches at various positions of the backbone (the only exception is the E. coli O77 O antigen that does not have any side-chain modification). Their O-antigen gene clusters contain the same genes in the same order and express proteins required for the biosynthesis of the common four-sugar backbone. The O-antigen gene clusters of the E. coli O77 group share > 99% identity to each other and 70–76% identity to that of Salmonella O6,14, suggesting that this O-antigen backbone was in the common ancestor. In S. flexneri, the side-branch Glc residues are added from UDP-Glc in a three-step process involving GtrA and GtrB common to all such residues and a side-branch-specific transferase. The three genes are always present as a set of three genes, which are on a prophage genome in the chromosome, and most probably the E. coli O77-related strains and Salmonella O6,14 gained their specific side-branch modifications by acquiring similar prophages carrying different gtr gene sets.
Salmonella O55 and E. coli O103
The O antigens of Salmonella O55 and E. coli O103 have similar pentasaccharide O units that differ in only one sugar (Glc vs. GlcNAc) and in the acyl group on Fuc3N (Ac group vs. Hb group; Table 1).
The DNA sequence identity in corresponding genes ranges from 53% to 76% (Fig. 3b), the only exception being the two acyltransferase genes (fdtC encoding an acetyltransferase in Salmonella O55 and fdhC encoding a 3-hydroxybutanoyltransferase in E. coli O103), which share no similarity and are responsible for the structural difference between dTDP-d-Fuc3NHb and dTDP-d-Fuc3NAc. We suggest that one of the two gene clusters acquired a new gene (acetyltransferase or 3-hydroxybutanoyltransferase gene) after species divergence (Liu et al., 2010c), but there is no indication as which was the original gene in the ancestor. There must also be a difference in the specificity of the second glycosyltransferase to ensure the difference in the third sugar as precursors for Glc and GlcNAc are generally available.
Salmonella O66 and E. coli O166
The O-antigen structures of Salmonella O66 and E. coli O166 differ only in the linkage between O units and the presence of an O-acetyl moiety in the former. The O-antigen gene clusters of Salmonella O66 and E. coli O166 have nearly identical organizations, the only exception being that the wzy gene in E. coli O166 is replaced by a noncoding region in Salmonella O66 (Fig. 3c) (Liu et al., 2010a). It is proposed that a functional wzy gene outside the O-antigen gene cluster is involved in the synthesis of the O antigen of Salmonella O66, similar to what is found in Salmonella serogroups A, B, and D1 (Naide et al., 1965; Curd et al., 1998). The ancestral gene cluster of Salmonella O66 presumably had the wzy gene found in E. coli O166 between weiA and weiB, which would be no longer required after the bacteria gained the new wzy gene. The noncoding region in Salmonella O66 could be a remnant of a gene, but we found no region of similarity with the E. coli O166 wzy gene, probably owing to the substantial degradation observed between weiA and weiB.
Salmonella O43-E. coli O86 and Salmonella O13-E. coli O127
The four O antigens have similar four-sugar main chains varying mainly in the first sugar, which is GlcNAc in Salmonella and GalNAc in E. coli Also Salmonella O43 and E. coli O86 have a side-branch Gal that is lacking in the others. So for our purposes, there is a pair of related Salmonella O antigens, and both Salmonella O antigens have a related E. coli O antigen, all of which are treated together here.
Four of the genes (gmd-manC) of the Salmonella O13 and O43 antigen gene clusters have the same order and are 93–99% identical, as are the same genes in E. coli O127 and O86 (Figs 2b and 3d–f). In comparisons between the species, these genes are 74–84% identical. This is as expected for genes that were present in the common ancestor and diverged as the species diverged, with the genes in the two gene clusters undergoing frequent recombination within each species so that they evolved in concert (Samuel et al., 2004). The manB gene immediately downstream of manC is similar, but has rather more divergence than the gmd-manC genes due to having a CA gene cluster form of manB in the Salmonella and E. coli strains.
The other genes show quite complex patterns including high levels of divergence as discussed below.
The choice of first sugar is determined when the second sugar, a GalNAc residue, is added to either UndPP–GalNAc or UndPP–GlcNAc by glycosyltransferases WcmA or WfbG, respectively (Yi et al., 2005). The wcmA and wfbG genes are second genes in the gene cluster, after the gne gene that is required for synthesis of the UDP-GalNAc substrate. The E. coli strains will also need a gnu gene for synthesis of the UndPP–GalNAc. The two genes in Salmonella are again highly similar (98–99% identity) as are the two in E. coli (99–100% identity). However, the gne genes are only 60–63% identical in comparisons between the species, and the two glycosyltransferase genes, wcmA and wfbG, are not related at all (no more than 30% identity). It appears that this end of the gene cluster was replaced in one of the species causing the first sugar to be replaced.
At the 3′ end are genes related to the addition of the side-branch Gal in Salmonella O43 and E. coli O86, and the corresponding glycosyltransferase wcmB gene is found only in those strains, where it is located between the wzx and wzy genes. The wdbR gene in the same location in E. coli O127 only is proposed to be an acetyltransferase gene based on sequence homology and may be responsible for addition of one of the O-acetyl groups to the Fuc residue in E. coli O127. The genes for addition of the main-chain Gal residue and addition of the Fuc residue to it are very different in the 2 structural forms. The main-chain Gal residue carries the Gal side branch in Salmonella O43 and E. coli O86, so this may account for the difference between wfbI and wcmD, as if the side-branch Gal is added first, it would affect the target sugar for the Fuc transferases. However, the explanation for the difference between wfbI and wcmD genes is not so simple, as they are responsible for the same linkage, although the first sugars of the molecule at this stage are different. All these genes, including wzx and wzy, are highly divergent, and only for wcmB does it seem likely that the various forms have diverged from the gene cluster in the common ancestor of the two species. Perhaps there have been gene replacements since species divergence, or perhaps the situation in the common ancestor was more complex than just having the two forms seen today and included the sequence diversity now observed.
Salmonella O30 and E. coli O157
Salmonella O30 and E. coli O157 have the same O-antigen structure that contains one residue each of d-Rha4NAc (N-acetyl-d-perosamine, d-PerNAc), d-Glc, l-Fuc, and d-GalNAc. The O-antigen gene cluster of Salmonella O30 is nearly identical to that of E. coli O157, the only difference being that Salmonella O30 lacks the acetyltransferase gene perB, which is located at the 3′ end of the E. coli O157 antigen gene cluster and is involved in the synthesis of d-Rha4NAc (Albermann & Beuttler, 2008) (Fig. 3g). An H-repeat remnant is located upstream of the perB gene in E. coli O157. It is likely that the acquisition of the E. coli O157 perB gene was mediated by the H-repeat element and occurred more recently. An acetyltransferase gene that converts GDP-d-Rha4N to GDP-d-Rha4NAc may be located elsewhere in Salmonella O30 genome.
Two special Salmonella GlcNAc-/GalNAc-initiated O-antigen forms (O54 and O67)
The O antigen of Salmonella O54 is different from all other reported bacterial O antigens in being synthesized by the synthase pathway. Salmonella O54 has a disaccharide O unit: →4)-d-ManpNAc-(β1→3)-d-ManpNAc-(β1) and is thus a homopolymer. The gene cluster responsible for the synthesis of the Salmonella O54 O antigen resides on a small mobilizable plasmid (Keenleyside & Whitefield, 1996), and mobilization of this plasmid into strains with a functional chromosomal O-antigen gene cluster can lead to the simultaneous expression of two distinct O antigens. However, in the Salmonella O54 type strain, this is not the case due to inactivation of the chromosomal O-antigen locus, but the strains for O54 serovars include some with group B, C1, C2, E, or 21(L) epitopes (Fitzgerald et al., 2007), suggesting that the plasmid is quite mobile in nature. The Salmonella O54 antigen gene cluster contains mnaA, wbbE, and wbbF. MnaA is a C2 epimerase that converts UDP-GlcNAc to UDP-ManNAc (Campbell et al., 2000). WbbE transfers the first UDP-ManNAc to UndPP–GlcNAc that is also synthesized by WecA, to complete an adapter. WbbF, an integral membrane protein, is responsible for both sequential addition of ManNAc and the concurrent extrusion of the nascent polymer across the cytoplasmic membrane (Keenleyside & Whitefield, 1996).
Salmonella O67 has previously been suggested to be a variant of serogroup B (O4) (Li & Reeves, 2000). Indeed, a molecular typing study based on O-antigen gene cluster probes found that the serogroup O4 antigen-specific gene does not distinguish strains of serogroups O4 and O67 (Fitzgerald et al., 2007). In this study, we sequenced the region between galF and gnd in Salmonella O67 and found it to be the same as for the serogroup O4 antigen gene cluster (with identity ranging from 99% to 100% for corresponding genes). However, the structural analysis revealed that the O67 antigen structure is similar to that of d-galactan I O antigen of K. pneumoniae (Table 1), the only difference being the presence of an O-acetyl group in Salmonella O67, which is consistent with the fact that there is no cross-reaction between serogroup O4 and O67 antigens and their respective antisera. The data show that the O67 gene cluster is not located between galF and gnd, but elsewhere in the genome.
The gene cluster responsible for the synthesis of d-galactan I in K. pneumoniae O1 has been identified downstream of gnd and consists of six genes, comprising wzm, wzt, wbbM, glf, wbbN, and wbbO (Clarke & Whitfield, 1992). wzm and wzt encode components of an ABC transporter for export of the O polysaccharide, and glf encodes a UDP-galactopyranose mutase, for conversion of UDP-Galp to UDP-Galf. In the ABC transporter pathway, O-antigen synthesis begins with the formation in the cytoplasm of a chain of O units on an acceptor UndPP–GlcNAc, which is synthesized by WecA. Galactan I synthesis has been studied by the Whitfield group and summarized in a recent review (Greenfield & Whitfield, 2012). WbbO was shown to be a bifunctional glycosyltransferase adding the first two-sugar repeat unit (Galp and Galf) to the UndPP–GlcNAc acceptor, forming the adaptor region of the O polysaccharide. Further extension of galactan I requires WbbM, which encodes a Galp transferase (Guan et al., 2001). WbbN is thought to be the Galf transferase for galactan I extension, although WbbO can replace WbbN as the Galf transferase in vitro. However, no genes with the potential for the synthesis of d-galactan I can be found downstream of gnd in Salmonella O67.
To identify the O-antigen gene cluster, we obtained a draft genome of the Salmonella O67 type strain using Solexa sequencing. A contig was found containing eight genes related to the synthesis of the O67 antigen. orf1-orf6 of that contig are identified as wzm, wzt, wbbM, glf, wbbN, and wbbO by homology with the genes of K. pneumoniae O1 (92%, 95%, 76%, 85%, 64%, and 79% identity, respectively) and account for the synthesis of d-galactan I. orf7 and orf8 were named wejU and wejV, respectively. wejU appears to be a glycosyltransferase gene, but its exact function is unclear. WejV shares similarity to many acyltransferase, and we propose that it is responsible for transfer of the O-acetyl group to the Salmonella O67 antigen.
We found that K. oxytoca 10–5246 has the same gene cluster (downstream of gnd) as that of Salmonella O67 including wejU and wejV. Some of the Klebsiella strains also have galactan O antigens, and we suggest that the O67 gene cluster was derived from a Klebsiella strain with this gene cluster. Currently, the genomic locus of Salmonella O67 antigen gene cluster is unclear. It should be noted that the wzm-wejU set of genes is also present in the genome of E. coli SMS-3-5, with many insertion elements present upstream and downstream of that region.
It is likely that the Salmonella O67 strain under study arose from a serogroup B strain by gaining a new gene cluster for d-galactan I that originally came from Klebsiella and has been incorporated into the chromosome at an unidentified locus and that this was followed by repression of the function of its original O-antigen gene cluster by means also not identified. It remains to be seen whether all O67 isolates have similar genetics, but serogroup O67 has only one serovar, named ‘Cresswell’, and isolates are extremely rare.
Structure and genetics of Salmonella Gal-initiated O antigens with close relatedness
There is a set of O antigens in Salmonella that have Gal as first sugar of the O unit, comprising serogroups O2 (A), O4 (B), O8 (C2-C3), O9 (D1), O9,46 (D2), O9,46,27 (D3), O3,10 (E1-E3), and O1,3,19 (E4). These O antigens also have many other similarities (Table 3). Except for serogroup C2-C3, they possess a main chain having a d-Manp-(1→4)-l-Rhap-(α1→3)-d-Galp trisaccharide repeat unit and may differ in (1) the configuration (α vs. β) and the position of the polymerization linkage (α 1→2 vs. α 1→6); and (2) the configuration (α vs. β) of the d-Manp-(1→4)-l-Rhap linkage. In serogroup C2-C3, the main chain is built up of l-Rhap-(β1→2)-d-Manp-(α1→2)-d-Manp-(α1→3)-d-Galp tetrasaccharide repeats. The major differences between serogroups are defined by the presence or absence and the identity (Abe, Tyv, or Par) of the side-branch 3,6-dideoxyhexose residue, and additional structural diversity is achieved by lateral glucosylation and/or O-acetylation, which in most cases are nonstoichiometric.
Table 3. Structures of Salmonella Gal-initiated O antigens [adopted from the recent review (Knirel, 2011)]
The other defining feature of Gal-initiated Salmonella O antigens is that they have the wbaP gene in the gene cluster for the initial transferase that transfers Gal-P from UDP–Gal to UndP to generate UndPP–Gal.
It is near universal in the Enterobacteriaceae for the first sugar to be GlcNAc or GalNAc with WecA as the initial transferase. The Gal-initiated O antigens are major exception, and it seems that the use of Gal as initial sugar arose in Salmonella since its divergence from E. coli. However, although there are only 8 Gal-initiated O antigens and they are found almost exclusively in subspecies I and II, they have nonetheless been very successful and dominate the isolation lists. The only exception to the presence in subspecies I and II for the serovar type strains is the C2-C3 O antigen strain, which is in subspecies IIIb.
There is a modular structure for this set of O-antigen gene clusters, with several genes, especially sugar synthesis genes, being shared by different gene clusters in conserved locations as shown in Fig. 5. The rml genes are at the 5′ end of each gene cluster as for many GlcNAc-/GalNAc-initiated O antigens, followed by four ddh genes (ddhD, ddhA, ddhB, and ddhC), which are responsible for the synthesis of CDP-4-keto-3,6-dideoxy-d-glucose (Samuel & Reeves, 2003), the precursor of CDP-Abe, CDP-Par, and CDP-Tyv (Fig. 4). The abe gene or prt plus tyv genes for completing the synthesis of CDP-Abe and CDP-Tyv, respectively (Fig. 4), are located just downstream of the ddh genes. However, the serogroup E O antigen does not contain a dideoxyhexose residue, and the gene cluster does not have the relevant genes. The manBC and wbaP genes are located at the 3′ end of all of these Gal-initiated O-antigen gene clusters.
The major differences between the gene clusters are found in their central regions, which contain the diverse glycosyltransferase genes and the O-unit processing genes (wzx and wzy). In addition, the gene orf17.4 with unclear function was found downstream of wbaP in some group E strains.
Group B was the first O-antigen gene cluster to be described (Nikaido et al., 1967; Jiang et al., 1991) as it is present in the strain LT2 that was used in many early studies in bacterial genetics. It has Abe as its dideoxyhexose and the four ddh genes plus the abe gene. The difference between the O-antigen structures of the D1 and B O antigens is the presence of a Tyv side-branch sugar in D1 in place of Abe. The D1 gene cluster has prt and tyv genes in place of abe, which accounts for the structural difference. The only difference between the O-antigen structures of serogroups A and D1 is the presence of a Par or Tyv side branch, respectively, and the gene clusters are near identical, with prt and tyv genes both present. However, the tyv gene is not functional in group A, and in serovar Paratyphi A at least, this has been shown to be due to a frameshift mutation near the start of the gene, which would prevent conversion of CDP-Par to CDP-Tyv (Verma et al., 1988). As mentioned above, the wzy genes of groups A, B, and D1 are not located within the O-antigen gene cluster, but at a locus named rfc. However, there are wzy remnants in their gene clusters.
The group D2 structure differs from the group D1 structure in the polymerization linkage and the configuration (α vs. β) of the d-Manp-(1→4)-l-Rhap linkage. It has been suggested that the O-antigen gene cluster of serogroup D2 has arisen by reassortment of the serogroup D1 and E gene clusters by recombination mediated by an H-repeat element (Xiang et al., 1994).
For serogroup D3, there are two forms of the O unit that differ only in the configuration of the linkage between Man and Rha (α-1→4 and β-1→4). wbaU is responsible for the formation of the d-Man-(α1→4)-l-Rha linkage (Curd et al., 1998), but there is no glycosyltransferase gene in the O-antigen gene cluster for the d-Man-(β1→4)-l-Rha linkage, which may be located elsewhere in the genome. The O-antigen gene cluster of serogroup D1 also is thought to have arisen from that of D3 by the loss of original wzy gene (Curd et al., 1998).
Group E was initially subdivided into groups E1, E2, E3, and E4, based on serology. Serogroups E1, E2, and E3 have been amalgamated as serogroup E1 (the 2007 Weill summary) on the basis that they have almost the same O-antigen gene clusters between galF and gnd. Recently, we found that serogroup E4 also has the same O-antigen gene cluster as E1 and so is not really a separate serogroup. The variation in the Glc side chain among serogroup E O antigens is presumably due to the presence of different bacteriophages with side-chain modification genes (Table 3).
Serogroup C2-C3 has a different order for the sugars and one more Man residue in the O unit. All of the genes in the central region of the gene cluster are unique to serogroup C2-C3. The C3 form was originally put in a separate serogroup C3, but differs only in having a Glc side branch that is due to a bacteriophage-encoded set of genes as is common in Salmonella.
This review covers the chemical structure and DNA sequence data for all Salmonella O antigens, including more recent work on the GlcNAc-/GalNAc-initiated Salmonella O antigens that are more directly comparable with those of E. coli and Shigella. Together with the previously published survey of Shigella O antigens, it provides insights into the evolution of O-antigen diversity in bacteria. It also documents the relationships between the O antigens of Salmonella and E. coli, which were underestimated before.
In our previous review (Liu et al., 2008), we had observed that Shigella has a higher than usual proportion of anomalies in its O-antigen gene clusters (17 anomalies in 33 O-antigen gene clusters), many of which are thought to be indicators of events that mediated the formation of new O-antigen forms, such as remnants of genes no longer required or elements that mediated gain of new genes. However, only 12 anomalies are found in the 37 gene clusters of the Salmonella GlcNAc-/GalNAc-initiated O antigens with the Wzx/Wzy pathway (excluding Salmonella O54 and O67), much lower than that in Shigella. The smaller number of anomalies indicates that for this major group of Salmonella O antigens, the structure of the gene clusters has generally been stable, indicating that the set of O antigens is well adapted to the Salmonella niche.
In the previous review (Liu et al., 2008), we also found that 21 of 34 Shigella O antigens are either identical or closely related to an O antigen in E. coli which is easily explained as all Shigella serotypes except for S. boydii type 13 are in fact part of the species E. coli. Homologous recombination occurs readily and was shown to be an essential mechanism in the diversification of Shigella O antigens.
Previous structural analysis of Salmonella and E. coli O antigens had revealed only a few shared structures, although early serological data had shown extensive cross-reactions (Orskov et al., 1977). However, in our recent studies, we found many more cases to give a total of 24 O antigens that are identical or closely related in Salmonella and E. coli. The most likely explanation for the observed similarities in the two species is that the each pair of gene clusters originated from a gene cluster that was present in the most recent common ancestor. In that case, the two gene clusters would have a similar organization. E. coli and Salmonella diverged about 140 million years ago, and 93% of E. coli and Salmonella housekeeping genes have levels of identity between 76.3% and 100% (Sharp, 1991). For 23 of the 24 O-antigen gene clusters that encode identical or closely related O antigens in E. coli and Salmonella, the average identity for corresponding genes is 73.5%, and the average identity for corresponding proteins is 73.7%. This is close to the lower end of the range for housekeeping genes, but the pattern is generally similar for all of the gene clusters. However, genes in these O-antigen gene clusters do diverge at a higher rate than housekeeping genes, suggesting that the O-antigen genes are under consistent selection pressure from the environments or hosts for better adaptation. This is not unexpected as they have an atypical GC content, suggesting an origin in another species, and they may still be adjusting to the new intracellular environment.
In these pairs of O-antigen gene clusters in Salmonella and E. coli, the identity levels for sugar synthesis genes, glycosyltransferase genes, and O-unit processing genes are different. The average level of identity is 77.1%, 70.3%, and 69.4%, respectively, for the three classes of genes and 81.4%, 67.1%, and 65.6%, respectively, for the proteins encoded by these genes. The divergence levels for glycosyltransferase genes and O-unit processing genes are consistently higher than those for sugar synthesis genes, being observed in almost all pairs of gene clusters. It appears that each pair has indeed diverged from a gene cluster that was in the common ancestor, but that the three classes of genes were subject to different selection pressures, although there is no experimental evidence for that.
Alternative explanations for the shared gene clusters are the following: (1) The gene clusters were recently transferred from one species to the other after species divergence. In this case, the two gene clusters should have a higher level of sequence identity, not related to the level for housekeeping genes. (2) They have a common origin but were acquired independently. In this case, we expect similar gene order, as is observed, but can make no predictions on level of divergence as it will depend on the time since the divergence of the donor species, which could be earlier or later than divergence of the E. coli and Salmonella species. (3) The two gene clusters were assembled independently either after species divergence or before being acquired by the E. coli and Salmonella lineages. In this case, the gene organization of the individual gene clusters should be different, so is not supported at all for the 23 pairs being discussed.
None of these alternative explanations fit the data as a sole explanation, but options (1) and (2) are also possible, although if at all common one would expect some pairs with a much higher or lower level of divergence. This conclusion is in agreement with one proposed earlier with data for just three structures (Samuel et al., 2004), but now has much stronger support. It is of course possible that some of the 23 gene clusters did arrive independently but happen to be close to the others in divergence, but if so, we suggest that this was a minority of them.
The exception is the case of Salmonella O52 and E. coli O153, in which gene clusters that are not related generate the same O-antigen structure. Each gene cluster has the expected number of glycosyltransferase genes and a wzx and wzy gene, but the order is different, and none have significant levels of identity. This is presumably a case of two gene clusters for a given structure that were assembled independently.
Some of the gene clusters shared by Salmonella and E. coli did evolve to generate new O-antigen forms by acquisition of new genes after species divergence as described above. The Salmonella O66 gene cluster is thought to have obtained a new wzy gene outside the O-antigen gene cluster that is responsible for the β-1→2 linkage between the O units. The original wzy gene for the β-1→3 linkage in the O-antigen gene cluster must have degraded over time as proposed for Salmonella serogroup B (Wang et al., 2002a) as it was no longer required in O-antigen synthesis. For the Salmonella O55-E. coli O103 pair, one of the two gene clusters must have acquired a new gene (an acetyltransferase gene or a 3-hydroxybutanoyltransferase gene) after species divergence to synthesize a different sugar derivative. For the related Salmonella O43–E. coli O86 and Salmonella O13–E. coli O127 pairs, there were significant evolutionary changes, but it is not yet possible to unravel what happened. The Salmonella O6,14 and the E. coli O77 groups are interesting as, like the group of related S. flexneri serogroups, diversity arises by acquisition (or loss) of different prophage genes for side-chain modification: only one form has been observed in Salmonella, but 5 in E. coli.
It should be noted that within some serogroups, there are also variant strains with O-antigen structures and gene clusters different from those of the type strains. For example, the O-antigen structure of one Salmonella O50 strain was reported to differ from that of the type strain in having a GlcNAc in place of a GalNAc residue (Senchenkova et al., 1997). Also Fitzgerald et al. found that the O-antigen-based molecular typing method they devised for Salmonella O13 cannot detect O13 strains belonging to subspecies IIIb or S. bongori (Fitzgerald et al., 2007). The genetic basis for this difference remains to be determined. In addition, there is more than one O-antigen structure for some other Salmonella O serogroups, usually obtained to determine the basis of serological variation, and most variations are in the side branches, as in serogroups O6,7, O6,14, and O30 (Table 1). These variations are probably due to the presence in the chromosome of different bacteriophage genomes that include O-antigen side-branch modification genes.
As a genus with a long evolutionary history, the mechanism for the generation and maintenance of O-antigen diversity in Salmonella is obviously different from that in Shigella (Liu et al., 2008), which is essentially a relatively small group of strains in another species (E. coli) that are distinguished by a capacity for host cell invasion that may have only recently been adopted in the species (Maurelli et al., 1998; Pallen & Wren, 2007). One of the major characteristics of Salmonella is that the O antigens can be divided into two different classes (Gal-initiated class and GlcNAc-/GalNAc-initiated class). The GlcNAc-/GalNAc-initiated Salmonella O antigens that we have just been discussing are similar to those in other members of Enterobacteriaceae in using the WecA initial sugar transferase encoded in the ECA gene cluster that is widely distributed in the family. Over half of the GlcNAc-/GalNAc-initiated O antigens are also found in the closest relative E. coli and all but one of these are thought to have been present in their common ancestor.
The Gal-initiated O antigens have a quite different evolutionary history and are thought to have entered the species quite recently, but although only 8 in number are now dominant. We do not know the reasons for this enormous difference between E. coli and Salmonella, with Gal-initiated O antigens greatly outnumbering the GlcNAc-/GalNAc-initiated O antigens in Salmonella, but to our knowledge not reported in E. coli.
Most Salmonella strains that cause serious infection in humans and animals have Gal-initiated O antigens. However, it is worth noting that the E. coli members of several Salmonella–E. coli serogroup pairs with identical or related O antigens, including E. coli O157, O55, O111, O145, O103, O118, and O78, are associated with important pathogenic E. coli strains. The long history of these O antigens in E. coli and Salmonella indicates that they are possibly adaptive in both species, but most Salmonella members are not recognized to be particularly pathogenic.
O-antigen diversity has been thought to be important in offering the various clones selective advantages in their specific niches. It has been estimated that a selective advantage of only 0.1% for one O antigen over another in a given niche is sufficient to maintain different alleles in different clones (Reeves, 1992), although it is difficult to demonstrate this in a laboratory assay. The O antigen is a target of the host innate immune system. It is recognized by the Toll-like receptor 4 (Royle et al., 2003), and it has been suggested that the pressures from the immune system may contribute to O-antigen diversity. Novel O antigens, especially those containing rare sugars, would not be recognized by the immune system. An example of the effect of a change in O antigen is given by Vibrio cholerae O139, a variant of the 7th pandemic O1 clone with a new O unit containing 2 Col residues and a QuiNAc residue (Knirel, 2011). It was first identified in 1992 Southern India and quickly spread in India and Bengal and some other Asian countries, totally displacing the O1 serogroup (Ramamurthy et al., 2003). It had the capacity to infect persons previously immune to the ancestral V. cholerae O1 form of the pandemic strain (Blokesch & Schoolnik, 2007), and this was thought to be the cause of its success (Ramamurthy et al., 2003). After few years, the O139 form virtually disappeared, but there has been periodic switching between V. cholerae O1 and O139 strains as agents of cholera in some areas (Faruque et al., 2003; Chatterjee et al., 2007). The strains also diversified other factors, which affect the balance of the two antigenic forms; however, the original rise of O139 form showed how powerful the selective pressure of O-antigen variation can be.
It should be noted that a relationship between O-antigen form and host has been observed in several bacteria, including Salmonella for which a host is commonly most easily infected by strains bearing a specific O antigen (Makela et al., 1973; Rabsch et al., 2002; Butela & Lawrence, 2010). In addition, most bacteria cannot evade an immune response by switching their O antigens in the timescale of an infection, as for H antigen phase variation. These data raise the possibility that the different O antigens expressed by different strains may confer advantages in different ecological niches, such as different host intestinal environments, which may be a major selection pressure for the generation and maintenance of O-antigen diversity (Butela & Lawrence, 2010). It has for instance been shown that diversifying selection mediated by predation from intestinal amoebae can contribute to O-antigen variation in Salmonella (Wildschutte et al., 2004; Wildschutte & Lawrence, 2007). Intestinal amoebae recognize antigenically diverse Salmonella strains with different efficiency, giving the various serotypes different ability to escape predators in particular environments. O-antigen variation is also helpful for bacteria in avoiding bacteriophage predation (Blokesch & Schoolnik, 2007). In addition, O-antigen diversity may provide selective advantage in other aspects; for example, they may mediate more effective adhesion to different intestinal mucins.
Serotyping has been very important for our understanding of diversity in Salmonella and is used to define the serovars that are referred to in most discussions of the genus. However, in recent years, several aspects of traditional serotyping methods have limited the utility of serotyping, especially in large-scale epidemiology studies. The techniques can be laborious and time-consuming, and the full range of sera needed is kept only in major typing centers. In addition, based on the O-antigen structure data obtained in this study, considerable serological cross-reaction is expected between E. coli and Salmonella. There has often been discussion of developing a molecular typing system for Salmonella based on the current serology scheme, and the relevant concepts have also been discussed and applied in other bacteria (Raymond et al., 2002; Li et al., 2009). The completion of the sequencing of the Salmonella O-antigen gene clusters provides the data for a comprehensive typing scheme for Salmonella using sequence diversity, but based on the serotyping scheme, to give in effect molecular serotyping. To facilitate this, we have included data on the specific genes that could be useful for this and also a comprehensive set of primers that we have developed for a microarray targeting the O-antigen-specific genes that can differentiate most Salmonella serogroups (Table S5) (Guo et al., 2013). The only exceptions are groups A and D1 that need to be further distinguished from each other using conventional serotyping methods, due to having near-identical O-antigen gene clusters. The mutations in the tyv gene of group A strains are not enough to easily distinguish groups A and D1, but the specific frameshift in serovar Paratyphi A could probably be developed into a specific test for this serovar.
For most serogroups, the O-unit processing genes (wzx and wzy) were selected as target genes, the exceptions being serogroups A/D1, O54, and O67, for which the sugar synthesis gene prt, the glycosyltransferase gene wbbE, and acetyltransferase gene wejV, respectively, were selected. For most serogroups, only primer pairs based on their own specific genes can generate the specific PCR products. However, due to the close relationship among the Gal-initiated Salmonella O-antigen gene clusters, combinations of primer pairs targeting more than one gene were necessary for detecting some of these serogroups (Table S5). For instance, as the prt genes of groups A, D1, D2, and D3 are highly similar, but not found in other O-antigen gene clusters, prt was used in the identification of all these groups, with D2 and D3, for example, further distinguished by their specific wzy genes. Our molecular typing system can also accurately differentiate Salmonella and E. coli strains with related O-antigen structures.
This work was supported by the National Key Programs for Infectious Diseases of China (2013ZX10004-216-001); the National 973 Program of China Grant (2012CB721001, 2009CB522603); the National Natural Science Foundation of China (NSFC) Key Program Grant 31030002; the NSFC General Program Grant (81171524, 31270003); and the Russian Foundation for Basic Research (projects 11-04-91173_NNSF-a and 11-04-01020-a). The authors have no conflict of interest to declare.