Galactocerebroside biosynthesis pathways of Mycoplasma species: an antigen triggering Guillain–Barré–Stohl syndrome

Summary Infection by Mycoplasma pneumoniae has been identified as a preceding factor of Guillain–Barré–Stohl syndrome. The Guillain–Barré–Stohl syndrome is triggered by an immune reaction against the major glycolipids and it has been postulated that M. pneumoniae infection triggers this syndrome due to bacterial production of galactocerebroside. Here, we present an extensive comparison of 224 genome sequences from 104 Mycoplasma species to characterize the genetic determinants of galactocerebroside biosynthesis. Hidden Markov models were used to analyse glycosil transferases, leading to identification of a functional protein domain, termed M2000535 that appears in about a third of the studied genomes. This domain appears to be associated with a potential UDP‐glucose epimerase, which converts UDP‐glucose into UDP‐galactose, a main substrate for the biosynthesis of galactocerebroside. These findings clarify the pathogenic mechanisms underlining the triggering of Guillain–Barré–Stohl syndrome by M. pneumoniae infections.


Introduction
Mycoplasma species are bacteria representing the smallest free-living organisms on earth. They include several pathogens infecting human, animals and plants. Mycoplasma is characterized by the lack of a cell wall, reason for which glycolipids of the membrane are exposed to the host's immune system upon an infection. Mycoplasma pneumoniae is a human respiratory pathogen causing atypical (or walking) pneumonia, accounting for approximately 20-30% of all types of pneumonia worldwide (Liu et al., 2009;Varma-Basil et al., 2009;Waites & Talkington, 2004;Zhang et al., 2016;Waites et al., 2017). Diagnosis of M. pneumoniae infections is currently performed mostly by PCR tests but remains complicated at an early stage of infection (Miyachi et al., 2009). Since about a decade, research to improve early diagnosis has focused on glycolipid antigens present on the membrane of M. pneumoniae (Matsuda, 2015). The percentage of glycolipids in M. pneumoniae's membrane varies between 6% and 10% of total lipids (Gaspari et al., 2019).
For long, M. pneumoniae has been suspected as a potential preceding factor of Guillain-Barr e-Stohl syndrome (GBS) (Ang et al., 2002;Yuki, 2007), which occurs at a frequency of about 5% of the total cases of past M. pneumoniae infections (van den Berg et al., 2014;Meyer Sauteur et al., 2016). The GBS is an autoimmune neurological disorder that is potentially life threatening. Campylobacter jejuni is the first microorganism that was associated with post-infectious outbreak of GBS (Rees et al., 1995) and has been found to perform galactocerebroside biosynthesis (Hao et al., 1998). Galactocerebroside has been shown to be immunogenic to a low degree in M. pneumoniae infections (Kusunoki et al., 2001;Susuki et al., 2004), and it is postulated that M. pneumoniae triggers GBS by inducing anti-galactocerebroside IgG (Meyer Sauteur et al., 2018;Smolders et al., 2019). In a clinical study about, a third of patients with central nervous system infections by M. pneumoniae revealed anti-GalC antibodies indicating that Mycoplasma pneumoniae also might induce other CNS symptoms by other mechanisms (Christie et al., 2007).
Galactocerebroside, also called galactosylceramide, is a sphingolipid, more specifically a cerebroside, characterized by a galactosyl head group. A similar compound is glucocerebroside, alias glucosylceramide, which instead consists in a cerebroside where the monosaccharide head group is glucose. Glycosphingolipids such as galactocerebroside and glucocerebroside are typically synthetized in bacteria by the enzymes, ceramide galactosyltransferase (or galactosylceramide synthasereaction EC 2.4.1.47) and ceramide glucosyltransferase (or glucosylceramide synthase reaction EC 2.4.1.80) of the glycosyltransferase family. The enzyme ceramide galactosyltransferase appears in viruses and cellular organisms; it synthetizes the biosynthesis of galactocerebroside by binding UDP-galactose to a ceramide molecule, releasing UDP.
The glycosyltransferase of M. pneumoniae, encoded by gene mpn483, has been shown to synthesize galactosylceramide (most likely the beta-variant) using ceramide and UDP-glucose as substrates, both with phosphatidylglycerol or cardiolipin as activators. M. pneumoniae has access to ceramide and galactose. It imports ceramide both from in vitro growth in axenic medium and from the host in vivo during infection . In addition, it can use the fatty acid chains from incorporated ceramide in other lipids to build up ceramide-based glycolipids. Moreover, M. pneumoniae favours the import of glucose in vivo, albeit not in vitro, in which galactose is preferred (Plackett et al., 1969). Finally, it is postulated but not confirmed that M. pneumoniae contains a potential epimerase converting UDP-glucose into UDP-galactose (Dandekar et al., 2000).
Characterization of the galactocerebroside biosynthesis pathway in M. pneumoniae will further clarify pathogenic mechanisms and can greatly impact the development of methods early detection and diagnosis of GBS. Moreover, these data are essential for the design of alternative metabolic pathways for M. pneumoniae avoiding galactocerebroside formation and to identify alternative Mycoplasma species devoid of galactocerebroside for biomedical applications.
We present here a comparative analysis investigation of 9 strains of M. pneumoniae and additional 103 currently genome-sequenced Mycoplasma species. The goal is to identify the genetic determinants of galactocerebroside biosynthesis and to further characterize the proteins in this pathway.

Experimental procedures
All available and complete Mycoplasma genome sequences were retrieved from the NCBI Genome repository. Overall, 224 genome sequences were obtained, belonging to 104 species that are listed in File S1. All genome sequences were re-annotated using the SAPP pipeline . Gene prediction was performed using Prodigal 2.6.3 (Hyatt et al., 2010), and protein sequences were annotated using InterProScan 5.36.75.0 (Jones et al., 2014) to assign functional domains. Annotation data were stored in a triple-store (GraphDB) (G€ uting, 1994) in a linked data format using the GBOL ontology as schema (van Dam et al., 2019) and systematically queried using SPARQL.

Results
We studied the functionalities associated with the M. pneumoniae genome, through an analysis of protein domains, to describe the biosynthesis pathway that leads to the formation of galactocerebroside and compare it to the pathways found in other mycoplasmas. Therefore, we analysed 224 genomes of 104 Mycoplasma species (given in File S1), comprising 213 strains. To ensure uniform annotation and a consistent comparison, all genomes were re-annotated.
The synthesis of glycosphingolipid such as galactocerebroside, in bacteria, needs a glycosyltransferase enzyme, linking a sugar (galactose) to a phospholipid (ceramide) and building the glycosyl bond. Our functional analysis identified 4 genes containing a glycosyltransferase domain in M. pneumoniae. These have been found in the 9 analysed strains and correspond to locus tags mpn028, mpn483, mpn075 and mpn064 in M. pneumoniae M129. While mpn064 (deoA) codes for a thymidine phosphorylase (EC 2.4.2.2) that contains a 'glycosyl transferase family 3' domain with InterPro identifier IPR000312, mpn028, mpn483 and mpn075 contain a 'glycosyltransferase 2-like' domain, ( IPR001173), associated with proteins that have been linked to glycosphingolipids biosynthesis pathways and, specifically, to proteins that have been proven to own glycosyltransferase activity (Sobhanifar et al., 2016). Their lengths and E-values of associated glycosyltransferase are shown in Table 1. The glycosyltransferase with more significant E-value is MPN_028, while the one known to synthetize galactocerebroside is MPN_483 . The InterPro IPR001173 domain comprises two Pfam domains: 'Glyco_trans_2_3' (PF13632) and 'Glyco_-transf_2' (PF00535); this last is found in two glycosyltransferases of C. jejuni, with E-values of 10 -29 (Putative galactosyltransferase -UniProtKB Q8KWR2) and 10 -27 (Beta-1,3-galactosyltransferase coded by gene cgtB -UniProtKB Q5DT13). Focusing on the Mycoplasma species, we therefore assumed the protein sequences containing the Pfam domain PF00535 to be the ones associated with galactocerebroside synthesis. Thus, we continued our analysis focusing on PF00535 and discarded further analysis of PF13632. The distribution of the E-values of this signature, presented in Fig. 1, shows a bimodal distribution. The presence of two peaks suggests two similar but distinct domains. Thus, we re-built HMMs on the two separate groups obtaining two new domains, M100535 and M200535. Then, E-values for these domains on the sequences were re-calculated, as indicated in Fig. 1. This approach results in two groups of protein sequences each matching its corresponding motif with much higher significance, that is much lower E-values. All the strains of M. pneumoniae contain only the second motif M200535. The HMM for M200535 motif is provided in File S2. The two glycosyltransferases of C. jejuni match M200535 with much higher significance than PF00535: the beta-1,3-galactosyltransferase with Evalue 10 -43 and putative galactosyltransferase with Evalue 10 -42 . An additional glycosyltransferase family 2 protein matching M200535 with E-value 10 -42 was identified in C. jejuni, and full results are given in File S3.
The two new M100535 and M200535 motifs show some substantial differences: the sequence alignments of the proteins carrying the domains reveal M200535 to be almost a twice as long as M100535. Most differences between the two motifs are present in the first part of the domain, where M200535-containing sequences show a predominance of aspartic acid in positions 344, 348, 428, 430, 436 and 455 (Fig. 2). Consensus sequences for both M100535 and M200535 are provided in File S4.
Occurrences of the domains M100535, M200535, here defined, and of domains PF00534, PF13439 or PF13641 associated with alternative glycosyltransferases are summarized in Table 2. In total, 73 out of the 104 Mycoplasma species analysed match at least one glycosyltransferase domain. It should be noted that any genome containing either M100535 or M200535 also contains PF00535.
We can conclude the pathway for galactocerebroside synthesis in Mycoplasma pneumoniae in vivo is most likely as represented in Fig. 3.
Domain Pfam PF01370 is associated with a functionally equivalent epimerase. Interestingly, analysis of all Mycoplasma species shows that almost only the species containing at least one domain, M200535 matched the UDP-glucose-epimerase domains Pfam PF16363 and/or PF01370 (Fig. 4). Exceptions are M. sp. Bg1 (not containing any glycosyltransferase domain but matching an epimerase domain) and Mycoplasma iowae (containing a glycosyltransferase domain different from M200535).

Discussion
Our results show that all strains of M. pneumoniae analysed have at least three glycosyltransferases encoded in the genome (in M. pneumoniae strain M129 these are MPN_075, MPN_028 and MPN_483) that can potentially perform the synthesis of galactocerebroside. The highly significant match of M200535 with C. jejuni galactosyltransferases indicates that the sequences responsible of galactocerebroside synthesis contain the functional domain M200535, which, in this study, was found to be a distinctive motif with specific feature, while previously included in the Pfam domain PF00535. The Mycoplasma species and strains containing at least a protein with the functional domain M200535 are the only Mycoplasma (with the two exceptions M. iowae and the taxonomically not yet defined M. sp. Bg1) showing concomitant presence of a UDP-glucose epimerase domain, which converts UDP-glucose into UDP-galactose. UDP-galactose is used in the galactocerebroside synthesis and favoured, over UDP-glucose, as a substrate by the glycosyltransferase operating the linkage. In fact, M200535 contains conserved aspartic acid residues, which in other microorganisms such as E. coli have been found to be catalytically essential for glycosyltransferase to exploit their btransferase activity on UDP-sugars (Griffiths et al., 1998).
Our analysis pinpoints that almost all the Mycoplasma species infecting humans contain the domain M200535 (Fig. 4) Our work leads to the suggestion of genetic modifications that would validate the hypotheses formulated by computational analysis. The characterization of the pathway indeed suggests which genes should be a primary target for genetic modifications to avoid biosynthesis of galactocerebroside in M. pneumoniae. The most intuitive strategy would be the knock-out of the genes encoding for the glycosyltransferases blocking the transfer of galactose to ceramide. However, this seems to be nontrivial: in a global transposon mutagenesis inactivation experiment of M. genitalium, the gene encoding for the glycosyltransferase MG_517, homologous of MPN_483 in M. pneumoniae , remained untouched suggesting the gene encoding for this enzyme is essential (Glass et al., 2006). Although it is proven that mg517 and mpn483 share 77% of gene sequence similarity, the enzymatic activities slightly differ in terms of specificity (Andr es, 2011). The essentiality of mpn483, not excluded by gene transposon analysis (Lluch-senar et al., 2015), is expected as M. pneumoniae uses this enzyme to perform synthesis of many other lipids in the membrane (Klement, 2007), which we know are crucial for its survival (Gaspari et al., 2020). In the same way, the gene transposon analysis conducted by Lluch-Senar et al. suggests mpn075 and mpn257, respectively, coding for the glycosyltransferase Fig. 1. Top centre) Distribution of number of expected hits of all the Mycoplasma protein sequences to the Pfam domain PF00535 ('Glyco_-transf_2'). Expected hits are reported with correspondent -Log 10 (E-value). The histogram shows two different peaks: new HMMs were built on the two groups of sequences. The first group, -Log 10 (E-value) lower or equal to 13.4, is marked in orange and contains 178 sequences and the second group, -Log 10 (E-value) higher or equal to 13.6, is marked in blue and contains 227 sequences. The 4 sequences with -Log 10 (E-value) equal to 13.5 (comprised in the green bar) are used in both groups. Lower) distribution of matches to the two new motifs. The two new motifs obtained are called M100535 (left) and M200535 (right) and have amino acids lengths M = 341 and M = 349 respectively.   MPN_075 and the epimerase MPN_257, might be essential, at the contrary of mpn028, coding for the third glycosyltransferase, which is reported to be non-essential. A validation of our computational analysis would consist in knocking out mpn028 and replacing mpn483 and mpn075 with genes coding for glycosyltransferases that do not possess the motif M200535. To facilitate the genetic modification, genes coding for   glycosyltransferases of other mycoplasmas should be chosen, among the ones not containing the motif M200535 (Fig. 4). However, the essentiality of genes mpn075 and mpn257 remains to be clarified, due to the uncertainty of the gene transposon analysis method in establishing gene essentiality. Moreover, the essentiality of the gene arises only after several strain passages (Lluch-Senar et al., 2015). The knock-out of mpn257, coding for the UDP-glucose epimerase, would indeed constitute and additional layer of safety: the limited import of galactose in vivo is not a sufficient condition for assuming limiting quantities for biosynthesis of galactocerebroside, as the bacterium could take the needed amount of galactose from glucose conversion. M. pneumoniae is the Mycoplasma species which metabolome, transcriptome and proteome are among the best-studied, hence, giving access to profound basic research opportunities and development of novel approaches in biotechnology and biomedicine (Yus et al., 2009;Maier et al., 2013;Chen et al., 2016;Trussart et al., 2017;Yus et al., 2019), among which live attenuated vaccines (www.mycosynvac.org) and respiratory tract-related live biotherapeutics (www.pulmobio.c om). Therefore, an experimental application of our computational results would represent a suitable approach to render this bacterium safe in its numerous applications. In fact, other attempts to avoid biosynthesis of galactocerebroside seem to be not trivial: it would not be possible to knock-out genes involved in sphingolipid transporter since this transporter would not only import ceramide but also very important lipids such as sphingomyelin that were shown to be essential to the survival of M. pneumoniae (Gaspari et al., 2020). Instead, the introduction of a ceramidase, disassembling ceramide into its sphingosine backbone and fatty acids chains, is another potential strategy to integrate. In bacteria, this enzyme has been found in Pseudomonas aeruginosa (Kita et al., 2000;Okino et al., 1998) and Mycobacterium tuberculosis (Okino et al., 2010) to be neutral (Tani et al., 2004) and reversible (Ito et al., 2014), so not always favouring degradation of ceramide but also its synthesis, according to the environmental and cytosolic conditions. However, M. pneumoniae affinity for sphingomyelin is unique, therefore, the degradation of ceramide into sphingosine and fatty acids could be of advantage for M. pneumoniae to build up sphingomyelin, which it typically imports unchanged from the medium (Worliczek et al., 2007). The in silico characterization of the protein domain that might be responsible of the galactocerebroside biosynthesis has potential impact as a target for drugs related to post-infectious GBS: the elucidation of the pathway and its analysis on mycoplasmas could be used to assess the risk of post-infectious GBS development, helping in the development of therapeutical strategies for early diagnose and/or control of GBS.
Moreover, in this manuscript we report a group of mycoplasmas that lacks both UDP-glucose epimerase, converting UDP-glucose into UDP-galactose, and any glycosyltransferase domain, therefore unable to complex lipids with sugars. This group

Conclusion
All Mycoplasma pneumoniae strains have genes encoding for glycosyltransferases, of which at least two are essential and at least one has been proved to encode for an enzyme (MPN_483 in M129) that can synthetize glycosphingolipids such as galactocerebroside. Most likely, MPN_028 and MPN_075 perform the same synthesis, as they show high significant match for motif M200535. This motif was found as well in C. jejuni, the first microorganism linked to GBS through galactocerebroside biosynthesis. While the access of M. pneumoniae to galactose in vivo, when the import of glucose is at higher rate, remains unclear, the presence of a UDP-glucose epimerase MPN_257, converting UDP-glucose into UDP-galactose, could make the synthesis of galactocerebroside possible. We can conclude that all wild-type strains of M. pneumoniae are potentially capable of synthetizing galactocerebroside and will most likely be able to do so even with substitution or knock-out of the galactosyltransferase MPN483 or the epimerase MPN_257, which will be problematic for certain medical applications, that is human vaccines and live biotherapeutics. Similarly, this will be the case for Mycoplasma species with the functional domain M200535, presenting also a UDP-glucose epimerase domain. Our data show a set of Mycoplasma species that could serve as alternatives for such biomedical applications or that could provide pathway genes to modify M. pneumoniae accordingly.

Conflict of interest
Patent application n. EP20174842.3 by the Wageningen University & Research. Author Vitor A.P. Martins dos Santos has interests in the company LifeGlimmer GmbH. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.
File S1. List of Mycoplasma genome sequences used in the study, obtained from NCBI repository. File S2. Motif M200535, suggested to be responsible for galactocerebroside biosynthesis, as Hidden Markov Model.