Analysis of putative nonulosonic acid biosynthesis pathways in Archaea reveals a complex evolutionary history

Authors


Correspondence: Jerry Eichler, Department of Life Sciences, Ben Gurion University, PO Box 653, Beersheva 84105, Israel. Tel.: 972 8646 1343; fax: 972 8647 9175; e-mail: jeichler@bgu.ac.il

Abstract

Sialic acids and the other nonulosonic acid sugars, legionaminic acid and pseudaminic acid, are nine carbon-containing sugars that can be detected as components of the glycans decorating proteins and other molecules in Eukarya and Bacteria. Yet, despite the prevalence of N-glycosylation in Archaea and the variety of sugars recruited for the archaeal version of this post-translational modification, only a single report of a nonulosonic acid sugar in an archaeal N-linked glycan has appeared. Hence, to obtain a clearer picture of nonulosonic acid sugar biosynthesis capability in Archaea, 122 sequenced genomes were scanned for the presence of genes involved in the biogenesis of these sugars. The results reveal that while Archaea and Bacteria share a common route of sialic acid biosynthesis, numerous archaeal nonulosonic acid sugar biosynthesis pathway components were acquired from elsewhere via various routes. Still, the limited number of Archaea encoding components involved in the synthesis of nonulosonic acid sugars implies that such saccharides are not major components of glycans in this domain.

Introduction

Long held to be a property restricted to Eukarya, it now clear that Bacteria and Archaea also perform N-glycosylation, namely the covalent linkage of oligosaccharides to selected Asn residues of a target protein (Calo et al., 2010; Jarrell et al., 2010; Nothaft & Szymanski, 2010). In eukaryotes, diversity in N-linked glycan content is largely the result of Golgi-based addition of different sialic acids onto the common glycan core originally transferred to a target protein in the endoplasmic reticulum (Chen & Varki, 2010; Cohen & Varki, 2010). Sialic acids comprise a diverse family of more than 50 structurally distinct molecules with most being derivatives of the common sialic acid, N-acetylneuraminic acid (Neu5Ac). Sialic acids belong to the larger group of nine carbon-containing monosaccharides termed ‘nonulosonic acid (NulO)’ sugars (Lewis et al., 2009). In Bacteria, oligosaccharides bound to glycoproteins and other glycosylated molecules can include members of two other NulO sugar families, namely legionaminic acid and pseudaminic acid, in addition to sialic acids (Knirel et al., 2003). The structures of Neu5Ac, legionaminic acid, and pseudaminic acid are depicted in Fig. 1.

Figure 1.

Structures of a sialic acid (N-acetylneuraminic acid), legionaminic acid, and pseudaminic acid.

Despite the fact that N-linked oligosaccharides decorating archaeal glycoproteins present a variety in terms of sugar content not seen in their eukaryal or bacterial counterparts (Eichler, 2013), the contribution of NulO sugars to this diversity remains largely unaddressed. Indeed, the only study demonstrating a glycoprotein-linked NulO sugar in Archaea to date came with the identification of a legionaminic acid derivative on VP4, a component of the haloarchaeal virus HRPV-1 (Kandiba et al., 2012). Furthermore, Methanobrevibacter smithii was shown to generate pseudaminic acid, although it is not known whether this sugar is used in protein glycosylation (Lewis et al., 2009).

The pathways used for generating the nine carbon-containing NulO sugars all involve the linking of a nucleotide diphosphate-activated six carbon-containing N-acetylhexosamine with three carbon-containing phosphoenolpyruvate followed by CMP-based activation (Angata & Varki, 2002; Schoenhofen et al., 2006; Glaze et al., 2008). These sugars may then be further modified via the attachment of additional chemical groups (Angata & Varki, 2002). In Eukarya and Bacteria, biosynthesis of the common sialic acid NeuAc begins with the conversion of UDP-N-acetyl-α-D-glucosamine (UDP-GlcNAc) to N-acetyl-β-D-mannosamine (ManNAc) in a reaction catalyzed by UDP-GlcNAc-2-epimerase (NeuC) (Angata & Varki, 2002) (Supporting Information, Fig. S1). In deuterostome animals, ManNAc is phosphorylated to its 6-phosphate derivative that in turn condenses with phosphoenolpyruvate to form N-acetylneuraminate-9-phosphate (NeuAc-9-phosphate) in reactions catalyzed by ManNAc kinase and NeuAc-9-phosphate synthase, respectively. After dephosphorylation to N-acetylneuraminic acid (NeuAc) by NeuAc-9-phosphate phosphatase, the activated form of sialic acid, CMP-NeuAc, is generated by the actions of CMP-NeuAc synthase. The bacterial pathway for NeuAc biosynthesis differs from its eukaryotic counterpart in that ManNAc is not phosphorylated but is instead converted to NeuAc in a single step via direct condensation with phosphoenolpyruvate in a reaction mediated by NeuAc synthase (NeuB). As in eukaryotes, NeuAc formed in this manner is subsequently activated by CMP-NeuAc synthase (NeuA) (Angata & Varki, 2002; Tanner, 2005).

In addition to sialic acids, Bacteria can also generate other NulO sugars relying on a core set of steps involving homologues of enzymes contributing to sialic acid biogenesis, in addition to pathway-specific enzymes. For the synthesis of legionaminic acid, two different pathways of legionaminic acid biosynthesis have been described (Fig. S2). The first pathway, elucidated by expressing Legionella pneumophila genes in Escherichia coli (Glaze et al., 2008), begins with UDP-GlcNAc-4,6 dehydratase generating a 4-keto-6-deoxy derivative of UDP-GlcNAc. The next steps, involving UDP-GlcNAc aminotransferase and UDP-GlcNAc N-acetyltransferase, yield the main intermediate in legionaminic acid biosynthesis, UDP-N,N-diNAc-6-deoxy-Glc. The last steps in legionaminic acid biosynthesis require UDP-N,N’-diacetylbacillosamine-2-epimerase, N,N-diacetyllegionaminate synthase, and CMP-legionaminic acid synthase, homologues of NeuC, NeuB, and NeuA, respectively. In a second legionaminic acid biosynthesis pathway discovered in Campylobacter jejuni (Schoenhofen et al., 2009), GDP-GlcNAc-4,6 dehydratase, GDP-GlcNAc aminotransferase, and GDP-GlcNAc N-acetyltransferase serve the same functions as do their counterparts from the first pathway, although distinctions can be made. Specifically, the GDP-GlcNAc-4,6 dehydratase is an NAD+-dependent enzyme, while the GDP-GlcNAc aminotransferase is specific for the GDP-keto intermediate. Moreover, although the UDP-GlcNAc N-acetyltransferase is able to process the GDP-bearing compound, it is far more efficient when presented the UDP-bearing intermediate. In this second pathway for legionaminic acid biosynthesis, GDP-N,N-diNAc-6-deoxy-Glc is converted to 2,4-diNAc-6-deoxy-Man by the NeuC homologue, GDP-N,N’-diacetylbacillosamine-2-epimerase. Finally, 2,4-diNAc-6-deoxy-Man is transformed into the activated form of legionaminic acid, CMP-N,N-diacetyllegionaminic acid, by the same NeuB and NeuA homologues as employed in the first pathway.

The pathway used for biosynthesis of the NulO sugar pseudaminic acid has also been described in Helicobacter pylori and C. jejuni (Schoenhofen et al., 2006). The pathway begins with the conversion of UDP-GlcNAc to UDP-4-keto-6-deoxy-GlcNAc by a UDP-GlcNAc-4,6 dehydratase (Fig. S3), much as occurs in legionaminic acid biosynthesis pathway 1. UDP-4-keto-6-deoxy-GlcNAc is converted to 2,4-diacetamido-2,4,6-trideoxy-L-altropyranose (6-deoxy-AltdiNAc) via the actions of UDP-4-amino-4,6-dideoxy-AltNAc transaminase, UDP-4-amino-4,6-dideoxy-AltNAc N-acetyltransferase, and UDP-6-deoxy-Alt-diNAc hydrolase. Next, 6-deoxy-AltdiNAc is converted to pseudaminic acid by the NeuB homologue, pseudaminic acid synthase, with the activated form of the sugar, CMP-pseudaminic acid being formed by the NeuA homologue, CMP-pseudaminic synthase.

As a first step in defining the extent to which Archaea recruit NulO sugars for protein glycosylation, the genome sequences of 122 archaeal species were scanned for genes encoding enzymes involved in the synthesis of sialic, legionaminic, and pseudaminic acids. Such analysis not only identified genes annotated as encoding enzymes belonging to NulO sugar biosynthetic pathways but also offered insight into the evolutionary history of these pathways in Archaea. This study thus expands upon earlier work addressing NulO sugar biosynthesis across evolution (Lewis et al., 2009).

Materials and methods

Identification of archaeal enzymes involved in NulO biosynthesis

To identify putative archaeal enzymes involved in sialic acid biosynthesis, blastp-based searches of 122 archaeal genomes listed at the ncbi blast with microbial genomes site (www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?; January, 2013) were performed using the appropriate Drosophila or Campylobacter jejuni sequences as source sequences. Archaeal enzymes putatively involved in legionaminic and pseudaminic acid biogenesis were similarly identified using the relevant C. jejuni sequences as source sequences. Sequences with significant similarity (E-value ≤ 1e-5) and those currently annotated as enzymes of interest yet with higher E-values were selected as homologues of NulO sugar biosynthesis pathway components. Sequences selected in this manner were examined using the Integrated Microbial Genomes database (img.jgi.doe.gov/cgi-bin/w/main.cgi) to review the annotations of genes encoding the identified homologues, as well as neighboring genes, in the different archaeal genomes considered.

Calculation of the codon adaptation index (CAI)

CAI and expected CAI values were calculated using the CAIcal (genomes.urv.es/CAIcal) and E-CAI (genomes.urv.es/CAIcal/E-CAI) servers, respectively. Archaeal codon usage tables were taken from the Codon Usage Database (www.kazusa.or.jp/codon/).

Phylogenetic trees

Phylogenetic trees were created using the neighbor-joining (NJ) algorithm found as part of the mega 4 (Molecular Evolutionary Genetics Analysis, version 4.0) package (Tamura et al., 2007).

Results

Identification of NulO biosynthesis pathway components in Archaea

To identify archaeal homologues of NeuAc biosynthesis enzymes, a series of blastp searches were performed on 122 archaeal genome sequences. Such analysis did not identify archaeal homologues of genes encoding ManNAc kinase, NeuAc-9-phosphate synthase, or NeuAc-9-phosphate phosphatase, namely enzymes central to NeuAc biosynthesis in eukaryotes. Instead, genes encoding archaeal homologues of the bacterial pathway enzymes UDP-GlcNAc-2-epimerase (NeuC) and NeuAc synthase (NeuB), as well as CMP-NeuAc synthase (NeuA), catalyzing the final step of NeuAc biogenesis in both domains, were detected in 21 methanogenic or halophilic species belonging to the phylum Euryarchaeota, in two species belonging to the phylum Thaumarchaeota, and in an unclassified halophilic strain (Table 1). In some of these species, genes encoding homologues of all three enzymes of the bacterial pathway were identified, while in others, only two of the three were detected. At the same time, examples whereby genes involved in NeuAc biosynthesis were detected in the genome of only one of a group of closely related sequenced species were noted. For instance, whereas Methanococcus maripaludis C5 contains genes encoding UDP-GlcNAc-2-epimerase (NeuC), NeuAc synthase (NeuB), and CMP-NeuAc synthase (NeuA), no such genes were detected in Methanococcus maripaludis C6, Methanococcus maripaludis C7, or Methanococcus maripaludis S2.

Table 1. Homologues of NeuA, NeuB and NeuC in Archaea
SourceCMP-NeuAc synthase (NeuA)NeuAc synthase (NeuB)UDP-GlcNAc-2-epimerase (NeuC) 
Euryarchaeota     
Halobacteria     
Halobacteriales     
Halobacteriacaea     
Halomicrobium Hmuk_1467Hmuk_1456Hmuk_1461 Hmc. mukohataei
Halopiger Halxa_2362Halxa_2366Halxa_0889 Hpg. xanaduensis
Haloquadratum Hqrw_3374Hqrw_3369 Hqr. walsbyi C23
 HQ3518AHQ3519A Hqr. walsbyi HBSQ001
Halorhabdus HLRT1_02468HLRT1_02473HLRT1_09265 Hrd. tiamatea
Natrialba Nmag_0148Nmag_0151Nmag_0149 Nab. magadii
Methanobacteria     
Methanobacteriales     
Methanobacteriaceae     
Methanobacterium MSWAN_1359MSWAN_1357MSWAN_1356Methanobacterium sp. SWAN-1
 A994_06101A994_06106A994_11792 M. formicicum
Methanobrevibacter Msm_1538Msm_1539Msm_0853M. smithii PS, ATCC 35061
 mru_1876mru_1878mru_1697 M. ruminantium
Methanosphaera Msp_0326 Msp_0217 M. stadtmanae
Methanococci     
Methanococcales     
Methanocaldococcaceae     
Methanocaldococcus MJ_1063MJ_1065MJ_1504 M. jannaschii
Methanotorris Metfo_0299Metfo_0298Metfo_0295 M. formicicus
Methanococcaceae     
Methanococcus Maeo_0417Maeo_0420Maeo_0423 M. aeolicus
 MmarC5_0507MmarC5_0506MmarC5_0504M. maripaludis C5
Methanomicrobia     
Methanomicrobiales     
Methanomicrobiaceae     
Methanoplanus Metlim_2817Metlim_2819Metlim_0237 M. limicola
Methanoregulaceae     
Methanoregula Metfor_1312Metfor_1307Metfor_1306 M. formicicum
Methanospirillaceae     
Methanospirillum Mhun_3094Mhun_3098Mhun_0396 M. hungatei
Methanosarcinales     
Methanosarcinaceae     
Methanococcoides Mbur_1585Mbur_1586Mbur_1587 M. burtonii
Methanosarcina MA3766MA3767  M. acetivorans
Methanolobus Mpsy_2364Mpsy_2359Mpsy_0546 M. psychrophilus
Thaumarchaeota     
Nitrosopumilales     
Nitrosopumilaceae     
Candidatus Nitrosoarchaeumbg20_01561bg20_01528bg20_01555Candidatus N. limnia BG20
Nitrosopumilus Nmar_0135Nmar_0131  N. maritimus
UnclassifiedHalar_2551Halar_2545Halar_2543Halophilic archaeon sp. DL31

In seeking components of archaeal legionaminic acid biosynthesis pathways, efforts were first directed at identifying homologues of N,N-diacetyllegionaminate synthase and CMP-legionaminic acid synthase, namely the common final enzymes of both legionaminic acid biosynthesis pathways identified in Bacteria. N,N-diacetyllegionaminate synthase and CMP-legionaminic acid synthase, sequentially acting to generate CMP-N,N-diacetyllegionaminic acid from 2,4-diNAc-6-deoxy-mannose (Fig. S2), are homologous to NeuB and NeuA, respectively. Specifically, N,N-diacetyllegionaminate synthase and NeuB both mediate the conjugation of phosphoenolpyruvate with a N-acetylhexosamine while CMP-legionaminic acid synthase and NeuA generate CMP-activated forms of the corresponding NulO sugar. Accordingly, genes encoding N,N-diacetyllegionaminate synthase and CMP-legionaminic acid synthase were detected in the same set of archaeal species containing genes encoding NeuA and NeuB, listed in Table 1.

When these genomes were scanned for the upstream components of the two bacterial legionaminic biosynthesis pathways, distinct UDP-GlcNAc-4,6 dehydratase- and GDP-GlcNAc-4,6 dehydratase-encoding genes were identified (Table 2). While most genomes encoded both enzymes, some encoded UDP-GlcNAc-4,6 dehydratase or GDP-GlcNAc-4,6 dehydratase alone. In searching for homologues of the two aminotransferases catalyzing the second step of the bacterial pathways, the same genes were identified as encoding the two enzymes in all but a limited number of cases, where different genes were listed. Finally, only few genes encoding the third enzyme of each legionaminic acid biosynthesis pathway, namely the N-acetyltransferases, were detected. As such, the full set of enzymes of the UDP-GlcNAc-based pathway (termed ‘pathway 1’) was only seen in Halomicrobium mukohataei, Methanospirillum hungatei, and Methanococcoides burtonii, while only Natrialba magadii encodes the full set of enzymes of the GDP-GlcNAc-based pathway (termed ‘pathway 2’).

Table 2. Legionaminic acid biosynthesis pathway components in Archaea
Source Pathway 1Pathway 2
UDP-GlcNAc-4,6 dehydrataseUDP-GlcNAc aminotransferaseUDP-GlcNAc N-acetyl-transferaseGDP-GlcNAc-4,6 dehydrataseGDP-GlcNAc aminotransferaseGDP-GlcNAc N-acetyl-transferase
Euryarchaeota        
Halobacteria        
Halobacteriales        
Halobacteriaceae        
Halomicrobium Hmc. mukohataei Hmuk_1471Hmuk_1470Hmuk_1458Hmuk_1432Hmuk_1470 
Halopiger Hpg. xanaduensis Halxa_2369  Halxa_2368Halxa_2367 
Haloquadratum Hqr. walsbyi C23Hqrw_3365  Hqrw_3366  
 Hqr. walsbyi HBSQ001HQ3509A  HQ3510A  
Halorhabdus Hrd. tiamatea    HLRT1_07359  
Natrialba Nab. magadii Nmag_0147Nmag_0146 Nmag_0167Nmag_0146Nmag_3405
Methanobacteria        
Methanobacteriales        
Methanobacteriaceae        
Methanobacterium Methanobacterium sp. SWAN-1MSWAN_1353MSWAN_1354  MSWAN_1354 
  M. formicicum A994_06121A994_09286 A994_04950A994_09286 
Methanobrevibacter M. smithii PS ATCC35061Msm_1535Msm_1030 Msm_1309Msm_1030 
  M. ruminantium mru_1388mru_1959  mru_1959 
Methanosphaera M. stadtmanae Msp_0532Msp_0290 Msp_1114Msp_0290 
Methanococci        
Methanococcales        
Methanocaldococcaceae        
Methanocaldococcus M. jannaschii MJ_1061MJ_1066 MJ_0211MJ_1066 
Methanotorris M. formicicus Metfo_0292Metfo_0293 Metfo_0252Metfo_0293 
Methanococcaceae        
Methanococcus M. aeolicus Maeo_0428Maeo_0427 Maeo_0380Maeo_0427 
 M. maripaludis C5MmarC5_0501MmarC5_0502 MmarC5_1314MmarC5_0502 
Methanomicrobia        
Methanomicrobiaceae        
Methanoplanus M. limicola Metlim_0251Metlim_0197 Metlim_2809Metlim_2829 
Methanoregulaceae        
Methanoregula M. formiciciun Metfor_1303Metfor_1304 Metfor_1298Metfor_1304 
Methanospirillaceae        
Methanospirillum M. hungatei Mhun_2118Mhun_2126Mhun_2125 Mhun_2126 
Methanosarcinaceae        
Methanococcoides M. burtonii Mbur_1591Mbur_1589Mbur_1588Mbur_2232Mbur_1589 
Methanosarcina M. acetivorans MA3779   MA1185 
Methanolobus M. psychrophilus Mpsy_2367Mpsy_0708  Mpsy_2400Mpsy_2365
Thaumarchaeota        
Nitrosopumilales        
Nitrosopumilaceae        
Candidatus NitrosoarchaeumCandidatus N. limnia BG20bg20_01639bg20_00154    
Nitrosopumilus N. maritimus Nmar_0146Nmar_1748  Nmar_1748Nmar_0144
UnclassifiedHalophilic archaeon sp. DL31   Halar_0607 Halar_2199

The search for archaeal homologues of pseudaminic acid synthase and CMP-pseudaminic synthase consistently identified those genes annotated as encoding NeuB and NeuA, again due to the similar reactions these enzymes respectively catalyze, as discussed above. Hence, the search for archaeal homologues of the remaining four pseudaminic acid biosynthesis pathway components was limited to the same genomes as scanned above. Accordingly, genes encoding the first two pathway enzymes (UDP-GlcNAc-4,6 dehydratase and UDP-4-amino-4,6-dideoxy-AltNAc transaminase) were detected in all species considered, with the exception of Halorhabdus tiamatea and Methanobrevibacter ruminantium (Table 3). No complete pseudaminic acid biosynthesis pathway was, however, detected in any species, with the third enzyme in the pathway (UDP-4-amino-4,6-dideoxy-AltNAc N-acetyltransferase) not being detected in any of the genomes considered and the fourth pathway enzyme (UDP-6-deoxy-AltdiNAc hydrolase) only being identified in three species.

Table 3. Pseudaminic acid biosynthesis pathway components in Archaea
SourceUDP-GlcNAc-4,6 dehydrataseUDP-4-amino-4,6-dideoxy-AltNAc transaminaseUDP-6-deoxy-AltdiNAc hydrolase
Euryarchaeota
Halobacteria
Halobacteriales
Halobacteriaceae
Halomicrobium Hmc. mukohataei Hmuk_1471Hmuk_1470 
Halopiger Hpg. xanaduensis Halxa_2369Halxa_2367Halxa_2364
Haloquadratum Hqr. walsbyi C23Hqrw_3365Hqrw_3367Hqrw_3373
 Hqr. walsbyi HBSQ001HQ3510HQ2629A 
Natrialba Nab. magadii Nmag_0147Nmag_0146 
Methanobacteria
Methanobacteriales
Methanobacteriaceae
Methanobacterium Methanobacterium sp. SWAN-1MSWAN_1353MSWAN_1354 
  M. formicicum A994_06121A994_06116 
Methanobrevibacter M. smithii PS, ATCC 35061Msm_1535Msm_1536 
  M. ruminantium  mru_1959 
Methanosphaera M. stadtmanae Msp_1114Msp_0290 
Methanococci
Methanococcales
Methanocaldococcaceae
Methanocaldococcus M. jannaschii MJ_1061MJ_1066MJ_1062
Methanotorris M. formicicus Metfo_0292Metfo_0293 
Methanococcaceae     
Methanococcus M. aeolicus Maeo_0428Maeo_0427 
 M. maripaludis C5MmarC5_0501MmarC5_0502 
Methanomicrobia
Methanomicrobiales
Methanomicrobiaceae
Methanoplanus M. limicola Metlim_0251Metlim_2829 
Methanoregulaceae
Methanoregula M. formicicum Metfor_1303Metfor_1304 
Methanospirillaceae
Methanospirillum M. hungatei Mhun_3090Mhun_3093 
Methanosarcinales
Methanosarcinaceae
Methanococcoides M. burtonii Mbur_1591Mbur_1589 
Methanosarcina M. acetivorans MA3779  
Methanolobus M. psychrophilus Mpsy_2400Mpsy_2365 
Thaumarchaeota
Nitrosopumilales
Nitrosopumilaceae
Candidatus NitrosoarchaeumCandidatus N. limnia BG20bg20_01639  
Nitrosopumilus N. maritimus Nmar_0146  
UnclassifiedHalophilic archaeon sp. DL31Halar_0607  

Evidence of the complex evolutionary history of archaeal NulO sugar biosynthesis genes

The distribution of genes encoding archaeal NulO sugar biosynthesis pathway components was considered. In 17 of the 24 genomes, gene clustering was observed (Fig. 2). Moreover, in all but Halophilic archaeon sp. DL31 and Nitrosopumilus maritimus, such clusters included sequences assigned to legionaminic acid and pseudaminic acid biosynthesis pathways. However, given the scarcity of biochemical characterization of enzymes assigned roles in archaeal NulO sugar biosynthesis, it is not yet possible to unambiguously assign functions in many cases. For instance, because it could not be determined whether a gene for UDP-GlcNAc-4,6 dehydratase encoded that version of the enzyme involved in legionaminic acid biosynthesis pathway 1 or in the pathway for pseudaminic acid biosynthesis, the gene in question was assigned both roles. Likewise, because it was not possible to assign a given aminotransferase-/transaminase-encoding gene to a particular pathway, such genes were assigned this role in both legionaminic acid biosynthesis pathways as well as in the pathway for pseudaminic acid biosynthesis.

Figure 2.

Clustering of genes implicated in NulO sugar biosynthesis. The position and orientation of genes assigned to the different NulO sugar biosynthesis are schematically represented according to a color scheme presented in the inset to the figure. Gene numbers are provided below each gene. In those cases where more than one role was assigned to a given gene, overlapping arrows appear. All genes are arbitrarily drawn as being the same length.

In terms of gene cluster composition and ordering, little consistency exists. In the five halophilic euryarchaeota, species-specific cluster composition was seen in each case. Indeed, even the two Haloquadratum walsbyi species considered presented distinct cluster composition and arrangement. In the 10 methanogens where clustering was observed, identical composition was only seen in four species, namely Methanotorri formicicus, Methanococcus aeolicus, Methanococcus maripaludis C5, and Methanobacterium sp. SWAN-1. It should also be noted that these four species do not belong to the same class, with the first three being assigned to the Methanococci and the fourth being assigned to the Methanobacteria. In the other six methanogens where NulO sugar biogenesis gene clustering was observed, little similarity in cluster composition or arrangement was seen. Indeed, in Methanospirillum hungatei, two clusters were observed.

When the genes encoding NeuA, NeuB, and NeuC homologues were considered separately, once again only limited conserved organization was seen. Of the 19 genomes encoding all three genes, just seven contained these genes in adjacent or proximal positions on the same DNA strand (but not necessarily in the same order), while in three other genomes, at least one of these adjacent/proximal genes was found on the opposing strand. The three genes were found distant from one another in the remaining nine genomes where they were detected.

The inconsistent pattern of NulO sugar biogenesis genes clustering described above raised the possibility that lateral gene transfer (LGT) contributed to introducing these genes into archaeal genomes. To further examine this point, the GC content of NulO sugar biogenesis genes was considered. Such analysis revealed numerous genes with a GC content differing from that of the rest of the genome by > 10% (Table S1). Differing GC content was most often seen in those genes encoding NeuAc synthase and CMP-NeuAc synthase. Most strikingly, the GC content of most, if not all, Natrialba magadii and Methanoregula formicicum NulO sugar biogenesis genes substantially differed from that of the rest of the genome. Such differences offer strong support for these genes having been derived from elsewhere.

Further support for LGT was provided when codon usage was considered. The codons comprising NulO sugar biogenesis genes in the nine archaeal species for which codon usage tables are available were compared with the average codon usage in each genome to yield codon adaptation index (CAI) values (Sharp & Li, 1987). To identify those NulO sugar biogenesis gene CAI values that differed significantly from the CAI value of the genome (> 95% confidence level), the E-CAI server was consulted (see 'Materials and methods'). This tool calculates an expected CAI for a genome that is subsequently used to determine whether differences between the CAI values of a given gene and that genome are statistically significant (Puigbò et al., 2008). Such analysis revealed that seven of nine archaeal genomes examined contain genes with CAI values that distinguish them from other genes in that genome (Table S2). In Methanospirillum hungatei, seven of nine NulO sugar biogenesis genes have CAI values that differ significantly from that of the rest of the genome. LGT offers a possible explanation for such differences in codon usage (Mrázek & Karlin, 1999).

Finally, evidence for the diverse origin of archaeal NulO sugar biosynthesis genes came with the construction of phylogenetic trees for NeuB and NeuA sequences from Archaea and Escherichia coli, serving as a phylogenetic outlier, relative to a 16S rRNA gene phylogenetic tree assembled using sequences obtained from the same species (Fig. 3). Such comparisons showed the NeuA and NeuB phylogenetic trees to greatly differ from each other, as well as from the well-defined phylogenetic organization presented by the 16S rRNA gene tree, indicative of different evolutionary histories of the three sequences.

Figure 3.

Phylogenetic trees of archaeal NeuB (NeuAc synthase), NeuA (CMP-NeuAc synthase), and 16S rRNA gene sequences. The strains listed are as follows: 1. Hpg. xanaduensis, 2. Nab. magadii, 3. Hrd. tiamatea, 4. Hmc. mukohataei, 5. Halophilic archaeon sp. DL31, 6. Hqr. walsbyi C23, 7. Hqr. walsbyi HBSQ001, 8. M. limicola, 9. M. hungatei, 10. M. formicicum, 11. M. burtonii, 12. M. acetivorans, 13. M. psychrophilus, 14. M. smithii PS, ATCC 35061, 15. M. ruminantium, 16. M. stadtmanae, 17. Methanobacterium sp. SWAN-1, 18. M. formicicum, 19. Candidatus N. limnia BG20, 20. N. maritimus, 21. M. jannaschii, 22. M. formicicus, 23. M. aeolicus, and 24. M. maripaludis C5. In each tree, the corresponding E. coli (Ec) sequence served as an outlier. See 'Materials and methods' for details.

Discussion

Despite the impacts caused by their incorporation into eukaryal and/or bacterial glycoproteins (Kelm & Schauer, 1997; Neumeister et al., 1998; Thibault et al., 2001; Angata & Varki, 2002; Schoenhofen et al., 2006), little is known of the incorporation of sialic acid, legionaminic acid, or pseudaminic acid into archaeal glycoproteins. To date, the only study demonstrating a glycoprotein-linked NulO sugar in Archaea came with the identification of a legionaminic acid derivative (5-N-formyl-legionaminic acid) on VP4, a component of the haloarchaeal virus, HRPV-1 (Kandiba et al., 2012). Because HRPV-1 does not encode for enzymes involved in legionaminic acid biogenesis (Pietilä et al., 2009), Halorubrum sp. strain PV6, the native host of HRPV-1, likely possesses the biochemical machinery necessary for synthesizing this sugar. Indeed, when VP4 was expressed in Haloferax volcanii, a strain that does not contain NulO biosynthesis pathway genes, this sugar was no longer a component of the N-linked glycan. The availability of a genome sequence for Halorubrum sp. strain PV6 (or possibly an annotated version of the recently sequenced Halorubrum sp. strain T3 genome (Chen et al., 2012)) will allow confirmation of this assumption. Further study can also address whether native Halorubrum sp. strain PV6 glycoproteins also incorporate 5-N-formyl-legionaminic acid. Presently, no glycoproteins have been described in this organism. A second report of a NulO being detected in Archaea came with the demonstration of pseudaminic acid in a M. smithii extract (Lewis et al., 2009), correcting the previous prediction of sialic acid in this species (Samuel et al., 2007). Such efforts could not, however, confirm whether the pseudaminic acid detected was derived from a protein-linked glycan or otherwise related to protein glycosylation.

Given such limited experimental insight, the present study took a bioinformatics approach to better estimate the contribution of NulO sugars to archaeal protein glycosylation by determining the distribution of genes encoding components of NulO sugar biosynthesis pathways in Archaea. Analysis of the genomes of 122 archaeal species scanned for genes involved in NulO sugar biosynthesis revealed that such genes exist in only 21 (six halophiles and 15 methanogens) of 88 euryarchaeota considered, in two thaumarchaeota, and in one unclassified halophilic species. No such sequences were noted in the crenarchaeota, a major archaeal phylum accounting for almost a quarter of the sequenced genomes considered. A similar pattern was reported in an earlier analysis of 48 archaeal genomes, where nine species were predicted to encode NulO sugar biosynthesis pathway enzymes (Lewis et al., 2009). Thus, while N-glycosylation is apparently widespread in Archaea, with recent studies showing that 166 of 168 archaeal species considered encode the central component of N-glycosylation, the oligosaccharyltransferase AglB (Kaminski et al., 2013), it seems that archaeal protein glycosylation rarely recruits NulO sugars.

The genome-based approach taken here shows that Archaea only contain genes encoding enzymes that putatively catalyze the direct conversion of ManNAc to NeuAc as do Bacteria, namely NeuA, NeuB, and NeuC; genes for the eukaryal ManNAc kinase-based pathway are not detected in Archaea. Having said that, all but two of the 24 genomes of interest clustered these sequences adjacent to or in the immediate vicinity of other genes encoding products assigned roles in legionaminic or pseudaminic acid biosynthesis. Given the fact that NeuA, NeuB, and NeuC homologues are involved in legionaminic acid biosynthesis (Glaze et al., 2008; Schoenhofen et al., 2009), while NeuA and NeuB homologues are involved in pseudaminic acid biosynthesis (Schoenhofen et al., 2009), it would seem that Archaea synthesize these sialic acid-like sugars rather than sialic acids. Indeed, the genome analysis suggests that different Archaea make use of either or both of the legionaminic acid biogenesis pathways found in Bacteria (Glaze et al., 2008; Schoenhofen et al., 2009). Yet, because only NeuA and NeuB were detected in Nitrosopumilus maritimus and Halophilic archaeon sp. DL31, which also encodes NeuC, it remains possible that sialic acids can be generated in Archaea.

The relatively limited presence of NulO sugar biosynthesis genes in Archaea argues that many of the pathway components detected here were likely acquired from elsewhere in a nonconcerted manner. Several additional lines of evidence support this claim. The GC content of many archaeal NulO sugar biosynthetic genes differs from the average GC content of the surrounding genome by > 10%. This difference is particularly striking in case of the haloalkaliphile Natrialba magadii, where most of the NulO sugar biosynthetic genes fail to present the enhanced GC content typical of haloarchaeal genomes (Soppa et al., 2008). The same relatively GC-poor Nab. magadii genes are clustered in the genome. Indeed, many of the archaeal genes implicated in NulO sugar biosynthesis in other species are similarly clustered, although gene order and cluster composition are poorly conserved. Distinct codon usage is also observed for selected NulO sugar biosynthesis genes. Finally, phylogenetic trees of archaeal NeuB, NeuA, and 16S rRNA gene sequences do not coincide. Together, these observations offer support for LGT assuming a major role in the evolutionary history of archaeal NulO sugar biosynthesis.

Acknowledgements

JE is supported by grants from the Israel Science Foundation (8/11) and the US Army Research Office (W911NF-11-1-520).

Ancillary