Ultrastructural and glycoproteomic characterization of Prevotella intermedia: Insights into O‐glycosylation and outer membrane vesicles

Abstract Prevotella intermedia, a Gram‐negative bacterium from the Bacteroidota phylum, is associated with periodontitis. Other species within this phylum are known to possess the general O‐glycosylation system. The O‐glycoproteome has been characterized in several species, including Tannerella forsythia, Porphyromonas gingivalis, and Flavobacterium johnsoniae. In our study, we used electron cryotomography (cryoET) and glycoproteomics to reveal the ultrastructure of P. intermedia and characterize its O‐glycoproteome. Our cryoET analysis unveiled the ultrastructural details of the cell envelope and outer membrane vesicles (OMVs) of P. intermedia. We observed an electron‐dense surface layer surrounding both cells and OMVs. The OMVs were often large (>200 nm) and presented two types, with lumens being either electron‐dense or translucent. LC‐MS/MS analyses of P. intermedia fractions led to the identification of 1655 proteins, which included 62 predicted T9SS cargo proteins. Within the glycoproteome, we identified 443 unique O‐glycosylation sites within 224 glycoproteins. Interestingly, the O‐glycosylation motif exhibited a broader range than reported in other species, with O‐glycosylation found at D(S/T)(A/I/L/M/T/V/S/C/G/F/N/E/Q/D/P). We identified a single O‐glycan with a delta mass of 1531.48 Da. Its sequence was determined by MS2 and MS3 analyses using both collision‐induced dissociation and high‐energy collisional dissociation fragmentation modes. After partial deglycosylation with trifluoromethanesulfonic acid, the O‐glycan sequence was confirmed to be dHex‐dHex‐HexNAc (HPO3‐C6H12O5)‐dHex‐Hex‐HexA‐Hex(dHex). Bioinformatic analyses predicted the localization of O‐glycoproteins, with 73 periplasmic proteins, 53 inner membrane proteins, 52 lipoproteins, 26 outer membrane proteins, and 14 proteins secreted by the T9SS.


| INTRODUCTION
Prevotella intermedia is a Gram-negative, black-pigmented, nonmotile, rod-shaped bacterium.It predominantly colonizes subgingival regions as an obligate anaerobe and is notably linked with periodontitis.
Periodontitis is a chronic inflammatory disease primarily due to the accumulation of bacteria in dental plaque.It is characterized by progressive destruction of the tooth-supporting gum tissue and bone (Gasner & Schure, 2023).It has been reported that P. intermedia is positively related to clinical measures of chronic periodontitis, especially increasing pocket depth (Socransky et al., 1998), clinical attachment loss (Dahlen et al., 2014), and bleeding on probing (Joshi et al., 2014).Furthermore, P. intermedia has been detected in extraoral sites, such as NOMA (cancrum oris) lesions (Enwonwu et al., 2000) and bacterial tracheitis in children (Brook et al., 1997).
Notably, it is the only periodontal pathogen known to induce severe bacteremic pneumococcal pneumonia, accompanied by enhanced pneumococcal adhesion to lower airway cells (Nagaoka et al., 2014).
To date, no protein glycosylation system has been characterized in P. intermedia.Nevertheless, multiple genes encoding glycosyltransferases (GTs) involved in the biosynthesis of cell walls and LPS were found in P. intermedia, which varied among different strains (Kwack et al., 2022).Concerning the relationship between glycoconjugates, such as glycoproteins, LPS, and peptidoglycans (Lukose et al., 2017) with periodontal disease, genetic elements in P. intermedia isolates from diseased sites were shown to have more unique virulence factors associated with glycoconjugate synthesis than those from healthy sites (Zhang et al., 2017).These findings highlight a potential adaptation strategy of P. intermedia to employ GTs for glycoconjugate syntheses in response to dynamic oral environments during the transition from healthy to diseased states.It is also conceivable that genetic variations in GTs could enable the synthesis of species-specific glycans and glycoconjugates, ultimately influencing the degree of virulence (Kwack et al., 2022).With respect to immunogenicity, an immunobiologically active glycoprotein isolated from P. intermedia was established to activate immune cells from mice and gingival fibroblasts from humans via cytokine-inducing activity (Iki et al., 1997).This glycoprotein is devoid of fatty acids, and its activity can withstand heat inactivation at 100°C for 1 h, and treatment with proteases, while susceptible to periodate treatment (Iki et al., 1997).This suggests that this particular glycoprotein is not endotoxic and that its activity is attributed to the carbohydrate instead of the protein moiety.It is also plausible that this glycoprotein is a constituent of the capsular polysaccharide complex unique in black-pigmented bacteria such as Porphyromonas gingivalis, P. intermedia, and related species (Sugawara et al., 2001).Nevertheless, as a periodontal pathogen, it is reasonably speculated that glycosylation in P. intermedia might modulate the antigen exposure or host immune response, ultimately facilitating immune evasion.Overall, these studies provide evidence for the existence of protein glycosylation in P. intermedia.
While the type IX secretion system (T9SS) was recently studied in P. intermedia for the first time (Naito et al., 2022), it has been wellstudied and postulated in P. gingivalis that T9SS cargo proteins are secreted through the T9SS across the outer membrane, where their C-terminal domains (CTDs) are cleaved and their new C-termini are conjugated to LPS, enabling their attachment to the cell surface (Veith, Glew, et al., 2022).As the cargo protein is amide-linked to the sugar residues of the LPS (Veith et al., 2020), this process of LPS modification can also be considered a form of protein glycosylation.
Utilizing genetic engineering methods, the T9SS was demonstrated to be essential for black pigmentation, hemagglutination, biofilm formation, and the functioning of cell surface virulence factors in P.
intermedia (Naito et al., 2022).Interestingly, P. intermedia is the only species observed to have short CTDs with many containing only 50-60 residues compared to more than 70 residues in other species examined (Veith et al., 2013).In the same study, more than 10 cargo proteins of P. intermedia were substantially elevated in MW consistent with LPS modification however the modification sites and attached sugars are yet to be determined.
The general O-glycosylation system of the Bacteroidota phylum was first described in Bacteroides fragilis by Fletcher et al. (2009).By examining the glycoprotein candidates from all extracytoplasmic compartments, they experimentally confirmed the three amino acid glycosylation motif D(S/T)(A/I/L/M/T/V) in B. fragilis, where the glycan is O-linked to the Ser or Thr residue in the second position (Fletcher et al., 2011).This O-glycosylation system is now considered to be conserved across the whole Bacteroidota phylum (Coyne et al., 2013).Upon elucidation of the O-glycan structure in Elizabethkingia meningoseptica (formerly called Flavobacterium meningosepticum) (Reinhold et al., 1995), Flavobacterium columnare (Vinogradov et al., 2003), Tannerella forsythia (Posch et al., 2011;Tomek et al., 2021;Veith et al., 2021), B. fragilis (Posch et al., 2013;Tomek et al., 2021), P. gingivalis (Veith, Shoji, et al., 2022), and Flavobacterium johnsoniae (Veith et al., 2023), it is evident that the glycan consists of a common core glycan and a more variable outer glycan (Figure 1).Despite only the first 3-4 sugars in the core glycan being similar in these species, it was suggested that the core glycan share at least a common glycan epitope, which is likely an O-linked mannose or other hexose (Coyne et al., 2013).Conversely, the outer glycan is shown to be species-specific (Coyne et al., 2013).In terms of the three amino acid motif of O-glycosylation, it is conserved within numerous glycoproteins of Bacteroidota species and is recently extended to D(S/T)(A/I/L/M/T/V/S/C/G/F) based on the glycoproteomics studies of T. forsythia (Veith et al., 2021) and P. gingivalis (Veith, Shoji, et al., 2022).Moreover, the biosynthesis pathway of the outer glycans has been partially elucidated for T. forsythia, B. fragilis, and P. gingivalis (Coyne et al., 2013;Tomek et al., 2018Tomek et al., , 2021;;Veith, Shoji, et al., 2022).Among these, T. forsythia is the best studied with the roles of five GTs being well-documented (Tomek et al., 2018).Furthermore, O-glycosylation in T. forsythia was shown to modulate interspecies associations in a 10-species biofilm model, (Bloch et al., 2017)

| CryoET sample preparation and imaging
R2/2 carbon-coated 200 mesh copper Quantifoil grids (Quantifoil Micro Tools) were first glow-discharged for 30 s to enhance their hydrophilicity.One microliter of bovine serum albumin-treated 10nm colloidal gold solution was added to 4 μL of cell suspension (OD 650 1.5), which was then pipetted onto a Quantifoil grid inside a Vitrobot chamber (FEI) with 100% humidity.The extra fluid was blotted off using a Whatman filter paper and the grid was plungefrozen in liquid ethane.The grid was imaged using a Titan Krios G4 cryoEM, operating at 300 kV acceleration voltage and equipped with a Gatan energy filter and a K3 Summit direct detector.Tilt-series were acquired using Tomography 5 software version 5.14 (Thermo Fisher Scientific) with a tilt range of −51°to 51°in 3°increments.
Data were collected with a total dose of 120 e − /Å 2 , a defocus of around −8 μm, and a pixel size of 3.39 Å. Three-dimensional reconstructions of tilt-series were performed using the IMOD software package (Kremer et al., 1996).To increase interpretability, missing-wedge correction was done on the tomograms using IsoNet version 0.2 (Liu et al., 2022).

| Cell fractionation
P. intermedia cells were harvested by centrifugation at 8000g for 20 min at 4°C and OMVs were pelleted from the filtered cell-free culture fluid by ultracentrifugation at 175,000g for 15 h at 4°C. Cell pellets were washed with phosphate-buffered saline (PBS) and repelleted by centrifugation at 8000g for 20 min at 4°C.After resuspension in an acid salt buffer (ASB, 300 mM NaCl, 50 mM sodium acetate, pH 5.3), cells were lysed by two passages through a precooled Avestin EmulsiFlex C3 high-pressure homogenizer (Avestin) at ~25,000 psi.Unlysed cells were removed by centrifugation at 8000g for 20 min at 4°C.The supernatant was then ultracentrifuged at 100,000g for 30 min at 4°C to pellet and separate the membrane fraction from the soluble fraction.The membrane fraction was resuspended in ASB with sonication using a CPX 750 ultrasonic homogenizer (Cole Parmer) fitted with a 6.5 mm tapered microtip to generate a suspension of fine particles.The amplitude was set to 19% and the pulse to 1 s on, 2 s off for a total of 15 min.A portion of the whole membrane fraction was pelleted again by centrifugation at 42,000g for 20 min at 4°C.Another F I G U R E 1 Comparison of proposed O-glycan structures among Bacteroidota species, including (a) Flavobacterium johnsoniae (Veith et al., 2023), (b) Flavobacterium columnare (Vinogradov et al., 2003), (c) Elizabethkingia meningoseptica (Reinhold et al., 1995), (d) Porphyromonas gingivalis (Veith, Shoji, et al., 2022), (e) Prevotella intermedia (this study), (f) Bacteroides fragilis (Tomek et al., 2021), and (g) Tannerella forsythia (Tomek et al., 2021;Veith et al., 2021).Sugar symbols are based on Symbol Nomenclature for Glycans (Neelamegham et al., 2019).Hatched sugar symbols indicate the predicted isomeric form of the sugar.portion of the membrane fraction was treated with 1% TRITON X-100 detergent and mixed by rotation for 0.5 h at room temperature.

The extracted membrane fraction [or inner membrane fraction (IM)]
was the supernatant retained after centrifugation at 42,000g for 20 min at 4°C.Portions of the soluble fraction and OMV sample was precipitated with 13% trichloroacetic acid (TCA) by centrifugation at 15,000g for 20 min at 4°C.The pellets were washed with ice-cold acetone and centrifuged again.

| Partial deglycosylation
Portions of membrane fraction, precipitated soluble fraction and OMV sample were resuspended in 50% acetonitrile-0.1% aqueous trifluoroacetic acid (TFA), transferred to reaction vials and freezedried thoroughly overnight.Deglycosylation was performed following the protocol provided by the manufacturer of the PROzyme/ Glyko Glycofree chemical deglycosylation kit (GKK-500) (ProZyme & Inc, n.d.) as previously described (Veith et al., 2020).All the following steps were conducted in the fume hood due to the highly volatile and corrosive nature of trifluoromethanesulfonic acid (TFMS).Briefly, samples were placed in an ethanol/dry ice bath and 125 μL of TFMS/toluene mixture (90% TFMS, 10% anhydrous toluene) was slowly added using the predried glass syringes.
Reaction vials were placed in a freezer for 10 min at −20°C.After the first 5 min of deglycosylation reaction, the vials were briefly shaken to assist with the melting of the contents and the subsequent solvation of glycoproteins.The samples were then slowly neutralized with 375 μL (3 volumes) of pyridine/methanol/ water solution at a ratio of 3:1:1 in the ethanol/dry ice bath.After 5 min of neutralization on dry ice and 15 min on wet ice, samples were transferred to microcentrifuge tubes and 1 mL (8 vol) of 50 mM ammonium bicarbonate (NH₄HCO₃) was then added.Deglycosylated polypeptides in samples were recovered by precipitation with 13% TCA and washed with ice-cold acetone.

| Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and in-gel digestion
Untreated samples of whole membrane, soluble, extracted membrane and OMVs, and deglycosylated samples were dissolved in 1× NuPAGE LDS sample buffer and 50 mM dithiothreitol to denature proteins.After sonication and heating, all samples were separated by reducing SDS-PAGE and fractionated into 12 gel segments respectively (Figure 2).The segments were digested with trypsin in the gel as described previously (Gorasia et al., 2015) and extracted once with 0.1% aqueous trifluoroacetic acid (TFA) and once with 30% acetonitrile-0.1% aqueous TFA, both for 15 min, in an ultrasonication bath.Extracts were combined, evaporated in a vacuum centrifuge, and dissolved in 2% acetonitrile-0.1% aqueous TFA for MS analysis.

| Mass spectrometry
LC-MS/MS experiments were conducted on a Dionex Ultimate 3000 UHPLC interfaced with an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific) as previously described (Veith, Shoji, et al., 2022), with the following modifications.For the analysis of intact glycopeptides, peptides were eluted using a linear gradient of 2%-40% ACN over 85 min to obtain the high-energy collisional dissociation (HCD) and collision-induced dissociation (CID) spectra.A stepped field asymmetric ion mobility spectrometry (FAIMS) method was employed alternating between the compensation voltages (CVs) of −25 and −45 V. Specific glycan fragment ions (204.09, 244.03, 448.12 m/z) were used to trigger the additional CID scan, stepped collision energy HCD scan and electron transfer dissociation (ETD) or electron-transfer/higher-energy collision dissociation (EThcD) scans (all in the orbitrap) with previously described scanning parameters (Veith, Shoji, et al., 2022).The ETD and EThcD parameters were NCE 15%, a maximal injection time of 250 ms with an AGC of 500%, and a resolution of 30,000 using the extended mass range setting.MS3 spectra were acquired for the sample with the highest abundance of glycopeptides (gel segment #8 of the membrane fraction) to identify the sequence of the unknown glycan portion.A stepped FAIMS method was utilized again, with the following modifications.An inclusion mass list of the previously identified glycopeptide ions with delta masses (Δmasses) of ~1531 Da was used to trigger the MS2-level HCD spectra.Additional CID and HCD scans were conducted on the 448.12 m/z and 594.18 fragment ions in two separate experiments to obtain the MS3-level spectra.
For the analysis of acid-cleaved glycopeptides, membrane, and OMV samples were eluted using a 60-min gradient of approximately 2%-32% ACN.A FAIMS method alternating between −25 and −45 V was utilized, and only HCD spectra were collected.For the acquisition of CID spectra, an inclusion mass list of the previously identified glycopeptide ions was used to trigger the CID scans in separate experiments.Electron transfer dissociation (ETD) and electron-transfer/higher-energy collision dissociation (EThcD) scans were acquired in a separate experiment using similar inclusion lists but restricting the charge state to 3-8 (both singly and doubly charged ions were excluded).Initially, the raw MS data were searched with Byonic v4.6 (Protein Metrics) using the wildcard parameters with a Δmass between 200 and 2000 Da (Roushan et al., 2021).The result files were exported via Byonic Viewer v4.6.Once the most abundant glycoform was identified, the data were searched again using 1531.48(S, T) as an additional variable modification.A glycopeptide was considered identified and included in Table S2 when it contained the glycosylation motif and was detected more than once, with a Byonic score >200 and -Log10 (P) > 1.The same data was also searched using Mascot v2.8.2 (Matrix Science), with scores over 30 included.The false discovery rate (FDR) using the Mascot decoy for intact samples was 1.17% for all peptides, and 0% for glycopeptides.

| Peptide identification
For deglycosylated samples, an error-tolerant Mascot search was conducted with the error-tolerant modifications edited to include only the list of putative fragments specific to the P. intermedia glycan in addition to oxidation (M) set as a variable modification as usual.
Only glycopeptides with a Mascot score >30 were considered identified unless they were supporting an already identified site.Due to incompatibility between decoy mode and error-tolerant mode, the FDR was determined only for the most abundant modifications (Hex2-HexA and dHex-Hex).The FDR was considered insignificant as F I G U R E 2 Outline of sample preparation.Harvested Prevotella intermedia cells were initially divided into membrane, soluble, and outer membrane vesicles (OMV) fractions.A portion of the membrane fraction was extracted with detergent to enrich inner membrane proteins.Another fraction of the membrane sample and a fraction of the OMV sample were partially deglycosylated with trifluoromethanesulfonic acid (indicated by the corrosive hazard symbol).All six samples were subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis, excision into 12 equal gel segments, and in-gel digestion with trypsin and liquid chromatography coupled to tandem mass spectrometry.
glycopeptides (score >30) with either modification were absent in the decoy database search result.
Most glycosylation sites were supported by more than one peptide sequence or from multiple glycoforms (i.e., after deglycosylation).Manual validation of glycosylated peptides was performed for glycosylation sites that were only identified from one sequence/glycoform where the Byonic score was <300 or the Mascot score was less than 35.To pass validation, the MS 2 spectra needed to exhibit (i) the 448.12 m/z glycan fragment ion; (ii) a prominent Y 0 peak followed by peaks at +338, +500, and +646 Da corresponding to the first four sugars of the glycan; (iii) a convincing series of b-ions and y-ions consistent with positive peptide identification.

| Protein localization
Proteins were localized by bioinformatic approaches as follows.IM proteins and OM proteins were predicted based on the presence of transmembrane IM α-helices (TMH) and OM β-barrels respectively using the DeepTMHMM server v1.0.20 (https://dtu.biolib.com/DeepTMHMM/) (Hallgren, 2022).A BLASTp search (NCBI) was performed to identify the P. intermedia T9SS cargo proteins with Uniprot accession numbers equivalent to the T9SS cargo proteins with another type of accession numbers obtained from our previous report (Veith et al., 2013).These proteins were considered as cell surface proteins along with newly predicted cargo proteins.Lipoproteins located in the cell membrane were predicted using SignalP-6.0(https://services.healthtech.dtu.dk/services/SignalP-6.0/) (Teufel et al., 2022) and validated by the presence of signal peptides from the DeepTMHMM search.Proteins predicted to be periplasmic were selected from the remaining proteins when signal peptides were detected by both SignalP-6.0and DeepTMHMM.The remaining unassigned proteins lacked signal peptides and were therefore predicted to be cytoplasmic.Alternatively, these proteins might utilize atypical secretion pathways for their export and could also be assigned as proteins of "uncertain" location.

| CryoET analysis of P. intermedia cells and OMVs
The cryoET analysis of P. intermedia revealed cells are of coccobacillus shape measuring approximately 1.5 μm in length and 0.9 μm in diameter.Similar to P. gingivalis, it also possessed an electron-dense surface layer (EDSL) outside the outer membrane (OM) that plays a role in bacterial interactions (Figure 3a).Cryotomograms of purified OMVs revealed that P. intermedia generates a wide range of OMVs, varying in size (~0.01 to 0.6 μm wide), shape, and in their appearance (Figure 3b).Notably, these OMVs were also enveloped by the EDSL, showcasing a unique aspect of their composition and arrangement.
Interestingly, some OMVs displayed electron-dense interiors, indicative of the presence of proteinaceous substances while others appeared relatively empty (Figure 3b).This observation provides valuable insights into the potential functional diversity of OMVs produced by P. intermedia.12 gel segments, in-gel digestion with trypsin, and LC-MS/MS analysis (Figure 2).The raw MS/MS data were searched using both Byonic and Mascot.Table S1 lists the complete set of 1655 unique proteins detected from all fractions in P. intermedia.Overall, 574 glycopeptides within 224 glycoproteins containing 443 unique O-glycosylation sites were identified (Tables S2 and S3).These glycoproteins were found to be modified by a unique glycan moiety (see below).The fractionation was designed to allow the enrichment of exported proteins and hence increase the number of glycoproteins identified.The fractionation efficiency was assessed by predicting the localization of each protein identified (Table S1) and then plotting the proportion of each locale represented in the various fractions (Figure 4a).The membrane sample proved the best source for OM proteins; the OMVs were the best source of periplasmic and surface proteins; and the extracted membrane sample was the best source of IM proteins and lipoproteins (Figure 4a).The 224 identified glycoproteins were predicted to localize predominantly in the periplasm (73 glycoproteins), IM (53), or found to be lipoproteins (52) (Figure 4b and Table S3).Twenty-six proteins were predicted to be located in the OM, and six proteins were cytoplasmic or of uncertain localization.Only 14 were the predicted T9SS cargo proteins, possessing a total of 22 identified O-glycosylation sites.
Venn diagram analysis shows the relative importance of each fraction for the identification of glycoproteins (Figure 4c).While all fractions were useful for the identification of glycoproteins, the extracted membrane fraction ("IM") and total membrane fraction enabled the highest numbers of glycoproteins to be identified.The most heavily glycosylated proteins, characterized by the highest number of identified glycosylation sites were BWX39_03000 (OstA-like_N domain-containing protein), BWX39_01155 (Big_5 domaincontaining protein) and BWX39_07550 (kinase), with 10, 7, and 7 identified glycosylation sites, respectively (Table S3).

| Determination of glycoforms
To identify the glycan sequences linked to the glycosylated proteins, the initial step was to determine the masses of potential glycoforms.
The MS/MS data obtained from the intact samples were searched using Byonic with a wildcard setting, which allowed the identification of peptides modified with any delta mass (Δmass) value.A plot representing the frequency of Δmass values among the identified peptides exhibited a predominant high Δmass cluster of potential glycopeptides (Figure 5).The 1531-Da cluster highlighted in red, differing by 1-Da units, was found to be the most frequently observed in all fractions (Figure 5).Some of the other abundant clusters with low Δmass values could be attributed to Lys (128 Da) and Arg (156 Da) due to the misidentification of peptides containing two adjacent tryptic cleavage sites (Figure 5).
Through manual inspection of the MS1 spectra corresponding to peptides within the ~1531-Da Δmass cluster, a total of 136 monoisotopic masses were recorded and the majority of the calculated Δmass fell within the 1531-Da integer potentially representing a unique glycoform (Table S4).The clustering of these  S4).
The identified peptides with a potential modification of ~1531 Da were also examined through their corresponding MS/MS spectra.They were preliminarily identified as glycopeptides based on the presence of oxonium ions of sugar fragments in the MS2-level spectra, such as HexNAc (N-acetylhexosamine) at m/z 204.09,Hex  S5).Through manual inspection of the HCD spectra, the presence or absence of the 204.09m/z HexNAc oxonium ion was recorded and found to be present in 32 of these spectra.This suggests that most are indeed glycopeptides, however correct assignment of the peptide sequence was still in doubt.Since peptides lacking the motif were more likely to be false positives, only the top three having a Byonic score >400 were considered positively identified and included in the main tables (Table S2 and S3).These three sequences displayed potential glycosylation sites at ETI, ESV, and ESV respectively (Table S5).

| Identification of the glycan sequence
The proposed sequence of the 1531.48-DaO-glycan is dHex-dHex-HexNAc(HPO 3 -C 6 H 12 O 5 )-dHex-Hex-HexA-Hex(dHex) (Figure 1e).Its elucidation is described in detail below, but in outline, it was determined by: (i) analyzing the CID spectra corresponding to glycopeptides with a Δmass of 1531.48Da (Figure 6); (ii) the HexNAc(HPO 3 -C 6 H 12 O 5 ) portion was determined based on additional HCD spectra (Figure 7); and (iii) the whole sequence was confirmed by partial deglycosylation of glycopeptides with TFMS (Figure 8).deoxyhexoses (Figure 6).Rarely were any peaks observed to help dissect the structure of the 447.11Da moiety, however, in Figure 6a, a peak at m/z 1538.14 was consistent with our assignment of HexNAc-P followed by C 6 H 12 O 5 .
To identify the unknown 447-Da component, the HCD spectra of identified glycopeptides were manually inspected.
The common peaks across all glycopeptide HCD spectra should represent the sugar residues of the O-glycan, whereas the  10, 9, and 11, respectively).The CID spectra were triggered by the presence of the glycan fragment ion at 448.12 m/z.The labeled ions are doubly charged unless specified as 1 + .Note that some ions have two labels due to these fragments having the same mass.example, the Y 3 ion cannot be differentiated from the Y 4 Y 1b ion.variable peaks should correspond to the amino acid residues of different peptides.The peak representing the 448.12 m/z ion displayed the highest intensity across all inspected spectra and accurately matched the 447 Da component plus a proton for ionization (Figure 7a).The peaks at 204.09 and 245.04 m/z were also consistently present in all spectra examined and appeared to be complementary fragments of the 447 Da species (Figure 7a).
After matching the accurate masses to their molecular formula, it was hypothesized that the unknown moiety between Y 4 and Y 5 consisted of a HexNAc residue (203.08 Da) and a C 6 H 13 O 8 P moiety (244.03Da).
To further examine the composition of the 447-Da moiety, MS3level HCD scans were conducted on the 448.12 m/z ion.The peaks of interest, 204.086 m/z, and 245.04 m/z, were present in the HCD spectra confirming their origin in the 447 Da moiety (Figure 7b).The accurate masses and respective molecular formula of the fragment ions are listed in Table 1.Many ions could be assigned to the loss of water molecules (-18 Da), a common feature of sugar fragmentation.Except for those fragment ions assigned to a loss of H 2 O, the peak at 165.0748 m/z exhibited relatively high intensity and matched best to C 6 H 12 O 5 (Figure 7b).A search for known structures that correspond to this accurate mass using Metlin resulted in the best matches to sugar alcohols such as mannitol, sorbitol, galactitol, or iditol.The presence of extra fragment ions due to H 2 O losses (147.0646, 129.0542, and 111.0435 m/z) supported the assignment of the 165 m/z component to a sugar (Figure 7b and Table 1).Finally, the remaining moiety could be accurately attributed to a phosphate residue (HPO 3 ), which is commonly observed in complex glycans of Given that the 447.11-Da moiety is linked to a dHex on both sides in the glycan, the conclusion as to whether HexNAc or C 6 H 14 O 6 is linked to dHex could be drawn by examining the MS3 HCD scan of the 594.18 m/z ion (HexNAc + HPO 3 + C 6 H 12 O 5 + dHex).As shown in Figure 7c, the presence of the 430.11m/z peak was consistent with the dHex linked to the HexNAc although it could also arise by the loss of H 2 O from the 448.12 m/z ion.Nonetheless, the absence of the 391.10 m/z (dHex + C 6 H 12 O 5 + HPO 3 ) and 311.13 m/z (dHex + C 6 H 12 O 5 ) peaks supported that the dHex residue was not linked to the C 6 H 12 O 5 residue (Figure 7c).including 162, 308, 338, 484, 500, 646, 792, 849, 995, 1093, 1239, and 1385 Da.The most frequent Δmass values were at 500.14 and 338.08 Da, which accurately matched the Hex-HexA-Hex and Hex-HexA portions respectively (Table 2).This finding indicates the strong preference of TFMS to cleave at the dHex residues.Very few peptides were found to be cleaved around the HexNAc residue (Table 2).The glycopeptides identified in the partially deglycosylated sample exhibited a significant overlap with those modified with intact glycans (Table S2).
To characterize the sequence of acid-cleaved glycans, CID scans were performed.Figure 8 illustrates the same peptide, DGAVYFLQANDSTLR modified with uncleaved glycan (Figure 8a) and progressively truncated glycan (Figure 8b-f).The spectrum shown in Figure 8b confirmed the location of HexNAc as the 6th sugar in the sequence, which together with the data shown in Figure 7 completes the sequence determination of the outer glycan.The remaining CID spectra showed a sequential loss of a HexNAc, two dHex residues, a Hex, and a HexA confirming the assigned sequence of the glycan.

| Localization of glycosylation sites
Overall, 25 distinct O-glycosylation motifs were identified in P. intermedia (Figure 9 and Table S2).The total number of identified motifs is different from the number of peptides identified since a few sequences have more than one possible motif.P. intermedia had a strong preference for DS (L/I/V) residues and a relatively low preference for D(S/T) (G/F/C/S).The DSN motif was uniquely T A B L E 1 Accurate mass data from orbitrap MS3 of the 448.12 m/z ion.b The mass error is the relative difference between the observed mass of an ion and its calculated mass based on the assigned molecular formula.
T A B L E 2 Frequency of partially deglycosylated glycans and intact glycans.observed in F. johnsoniae (Veith et al., 2023) and P. intermedia.
Interestingly, 4 new motifs, DS(E/Q) and DT(D/P) were identified in P. intermedia (Figure 9 and Table S2).The Δmod score quantifies the difference in Byonic scores of the top matching site and the second-best scoring site.It indicates the confidence of the glycosylation site assignment in a peptide spectrum match.In this study, Ser or Thr was manually defined as the modification site, and therefore, the Δmod score was only available when more than one Ser/Thr residue was present in a glycopeptide.For glycopeptides with a Δmod score >15, 23 out of 26 glycans in intact peptides, and 34 out of 47 glycans in acid-cleaved peptides were assigned to the putative D(S/T) X motifs (Table S2).When the ΔMod score exceeded 30, all of the glycosylation sites were correctly assigned by Byonic (Table S2), providing robust evidence for glycosylation at the putative motifs.
To localize the exact O-glycosylation sites of some peptides, ETD or EThcD fragmentation was applied to the deglycosylated samples.It enables fragmentation along the peptide backbone, thereby keeping the glycan moiety linked to the amino acid, directly showing the site of glycan attachment (Catalina et al., 2007).Four ETD or EThcD spectra are provided in  S2 and S3).A previous study predicted, 389 candidate glycoproteins in P. intermedia strain 17, exclusively based on the presence of the O-glycosylation motif D(S/ T) (A/L/V/I/M/T) in proteome sequences (Coyne et al., 2013).Given that different strains within each species share a similar number of glycoproteins, (Coyne et al., 2013) (Veith et al., 2023).The higher numbers for P. intermedia may reflect better fractionation and optimized protocols.
Based on a series of MS/MS analyses, the O-glycan sequence was determined to be a 1531.48-DadHex-dHex-HexNAc(HPO 3 -C 6 H 12 O 5 )-dHex-Hex-HexA-Hex(dHex) (Figure 1e).As could be expected, the first three sugars (from the right), a hexose, a deoxyhexose, and a hexuronate are similar across all seven species whose O-glycan sequences or structures have been determined (Figure 1).In particular, the third sugar, HexA is a glucuronic acid in all known cases.This is consistent with the cross-reactivity of an antibody against the "core glycan" of B. fragilis with glycoproteins in all tested species from different classes of Bacteroidota (Coyne et al., 2013).Although the "core glycan" was defined as the first two sugars of the O-glycan in that study, the purity of the "core glycan" was not demonstrated (Coyne et al., 2013).Furthermore, the GT responsible for transferring the third sugar hexuronate remains unidentified (Coyne et al., 2013;Tomek et al., 2018;Veith, Shoji, et al., 2022).Consequently, the antibodies generated against this core glycan may potentially recognize additional sugars such as the well-conserved hexuronate (Figure 1).Beyond the HexA, the sequences become more divergent.T. forsythia generates the most complex and distinct O-glycan, featured by the presence of side consistent with a role for surface O-glycans as ligands for coadhesion interactions.In this study, we demonstrate the presence of the O-glycosylation system in P. intermedia for the first time by characterizing its O-glycoproteome with mass spectrometry, and where possible, relate it to ultrastructural features of P. intermedia cells and outer membrane vesicles (OMVs) as revealed by cryoelectron tomography (cryoET).We identify 443 putative Oglycosylation sites within 224 glycoproteins, determine the Oglycan sequence, and extend the list of O-glycosylation motifs.
Proteins and peptides were identified by searching against the Prevotella intermedia ATCC 25611 sequence database of 2156 protein sequences downloaded from UniProt Proteomes (Proteome ID = UP000187195).All searches were performed using trypsin and other parameters were as follows.Maximum missed cleavages = 2, peptide mass tolerance = 10 ppm, fragment mass tolerance = 0.04 Da, fixed modification = cysteine carbamidomethyl, and variable modifications = methionine oxidation.

3. 2 |
Overall identification of the glycoproteome P. intermedia cultures were fractionated into soluble, membrane, extracted IM, and OMV fractions.Partial deglycosylation was only performed on the membrane and OMV fractions.Intact or deglycosylated samples were subjected to SDS-PAGE, excision into F I G U R E 3 CryoET of Prevotella intermedia cells and outer membrane vesicles (OMVs).(a) Representative tomographic slice showing ultrastructural details of P. intermedia.Inset: magnified image revealing structural details of the cell envelope, including the inner membrane (IM), the peptidoglycan layer (PG), the outer membrane (OM), and an electron-dense surface layer (EDSL).(b) Tomographic slice of purified OMVs showing the presence of a thick EDSL (yellow arrowhead) surrounding the membrane of different-sized vesicles.There are translucent and electron-dense (white star) OMVs.The scale bar is 100 nm.

F
I G U R E 4 Fractionation efficiency and predicted subcellular localization.(a, b) The predicted subcellular localization of proteins is provided with the proportion of proteins being estimated by summing the Mascot scores for that locale.The assignment of protein localization was based on prediction tools including DeepTMHMM and SignalP.Proteins of uncertain localization that lack signal peptides were assigned as cytoplasmic.T9SS cargo proteins were considered cell surface proteins.Lipoproteins are associated with the inner membrane (IM) or outer membrane (OM).(a) According to all identified proteins across each sample fraction (membrane, soluble, outer membrane vesicles, and IM).Only the top 100 proteins with the highest Mascot protein scores in each fraction are included.The localization of all proteins is listed in Table S1.(b) Glycoproteins across all fractions.Only glycoproteins containing the putative glycosylation motif are shown.The predicted localization of individual glycoproteins is listed in Table S3.(c) Venn diagram analysis showing overlap of glycoprotein identification between fractions.Only strongly identified abundant glycoproteins (Mascot score >200) were used for this analysis.XI ET AL. | 7 of 18

Δmass
values could be the result of the inaccurate assignment of the monoisotopic peak by Byonic.Other less frequent values within the range of 1529-1534 Da corresponded to poorly defined isotopic envelopes and were deemed outliers.The accurate mass of the O-glycan was determined to be 1531.48Da by averaging the Δmass values around 1531 Da only (Table

(
hexose) at m/z 163.06, and dHex (deoxyhexose) at m/z 147.07.The putative O-glycosylation motif D(S/T)(A/I/L/M/T/V/S/C/G/F/N) was present in almost all of these identified peptides, which further supported their identity as glycopeptides.The modification of ~1531.48Da was therefore conceived to be the dominant O-glycan in P. intermedia.The data were then searched again in Byonic using 1531.48Da, specific to Ser or Thr, as another variable modification.A greater number of glycosylated peptides were identified, which encompassed nearly all the peptides with a Δmass of ~1531 Da from the wildcard searches.In all fractions, a total of 512 different peptides were identified, out of which 40 lacked the putative D(S/T) X motif (Table The CID spectra of distinct glycopeptides with an accurate Δmass of 1531.48Da exhibited consistent fragmentation patterns allowing the first five sugars (from reducing end, right to left) to be deduced as hexose, deoxyhexose side branch, hexuronate, hexose, and deoxyhexose (Figure 6).The 6th group between the Y 4 and Y 5 ions had a mass of 447.11Da which was later assigned to HexNAc(HPO 3 -C 6 H 12 O 5 ).The last two sugars were assigned to F I G U R E 5 Frequency of glycoforms detected by Byonic.The intact glycopeptides of membrane, soluble, extracted inner membrane, and outer membrane vesicles fractions were searched by Byonic using the wildcard setting.The three most abundant Δmass values were grouped into clusters in red and green and the corrected integer Δmass values are indicated above each cluster.Only Δmass values with a frequency >0.3% among all modified peptides are displayed.128 and 156 Da modifications represent additional Lys and Arg residues (missed cleavages) at the peptide N-or C-termini.

F
I G U R E 6 Collision-induced dissociation (CID) spectra of glycopeptides modified with the 1531-Da glycoform.(a) Proposed sequence of the O-glycan showing the fragmentation scheme.(b-d) CID spectra of three glycopeptides.The identified glycopeptide sequences are shown with the putative O-glycosylation motifs highlighted in red.The precursor m/z values are indicated for each glycopeptide.Y 0 corresponds to the unmodified peptides.The three glycopeptides shown originated from the membrane sample (gel segments #

F
I G U R E 7 High-energy collisional dissociation (HCD) spectra to characterize the 447 Da moiety.(a) A representative MS2-level HCD spectrum showing common peaks in the low mass region.The peaks at 204.09, 245.04, and 448.12 m/z represent the predicted 203.08-DaHexNAc, the unknown 244.03-Da portion, and the 447.11-Da moiety, respectively.These labeled ions in the low-mass region were common across all inspected spectra of glycopeptides.(b) The MS3-level spectrum of the 448.12 m/z ion.The ions shown in blue represent losses of water from the ions of interest (Table 1).(c) MS3 spectrum of the 594.18 m/z ion to confirm the position of C 6 H 12 O 5 and HexNAc.The fragment ions of interest that correspond to the hypothesized moieties are highlighted in red.All fragment ions are singly charged.the signed sequence of the glycan.F I G U R E 8 Collision-induced dissociation (CID) spectra of partially deglycosylated glycopeptides.The glycans are derived from the peptide DGAVYFLQANDSTLR within the protein BWX39_07850 (copper resistance protein NlpE).(a) represents the full-length sequence of the glycan.(b-f) matches the Δmass value of each acid-cleaved glycan, which is 995 (792 + HexNAc), 792, 500, 338, or 162 Da, respectively.The peaks corresponding to precursor ions are indicated by red asterisks (*).Notations in grey indicate the positions where the cleavage is absent in that particular spectrum.
bacteria and yeast.Collectively, the data indicate that the 447.11-Da portion is composed of a HexNAc, a phosphate group, and a C 6 H 12 O 5 sugar alcohol residue.The presence of peaks at 245.0414 m/z (HPO 3 + C 6 H 12 O 5 ) and 284.0518 m/z (HPO 3 + HexNAc), as well as the lack of a peak at ~368 m/z (HexNAc + C 6 H 12 O 5 ), indicated that the phosphate is positioned in the middle (HexNAc-HPO 3 -C 6 H 12 O 5 ).
To confirm the glycan sequence identified in the intact glycopeptides, the MS/MS data of acid-cleaved glycopeptides were initially subjected to the Byonic wildcard search.The frequent Δmass values observed were consistent with the cleavages expected from the assigned glycan sequence.Mascot searches were then conducted involving defined modifications corresponding to the different portions of the 1531.48-Daglycan,

Figure
Figure 10a-d, showing the 500-Da modification precisely located at the Ser or Thr residues within the DSL, DSV, DSI, and DTL motifs, respectively, which were the four most abundant O-glycosylation motifs found in P. intermedia (Figure 9).Series of c-and z-ions demonstrated the presence of the modification at the expected motif instead of other potential sites (Figure 10).For example, as shown in Figure 10b, the 500-Da glycan fragment accounts for the large gaps between z6 and z7 ions (red), and between c12 and c13 ions (blue), indicating the site of modification at the Ser residue (green) in the middle.All the other ions correspond to each amino acid sequentially.

F
I G U R E 9 Frequency of O-glycosylation motifs identified in Prevotella intermedia.The number of glycopeptides (frequency) possessing each site ranged from 2 to 105 as shown.All except four motifsbe long to the putative Bacteroidota O-glycosylation motif D(S/T) (A/I/L/V/M/T/S/G/ F/C/N).The four motifs uniquely identified in P. intermedia were DS(E/Q) and DT(D/P).Other newly identified motifs that were present with another putative motif in a single sequence were excluded.XI ET AL.|13 of 18

F
I G U R E 10 Electron transfer dissociation (ETD) or electron-transfer/higher-energy collision dissociation (EThcD) spectra showing localization of glycosylation sites.The peptides are (a) FYIDHNNGTIYNPDSLPYGAK, (b) DLYPDRYEAWKDSVWTSLK, (c) SRQDSIAYAEKLE-SIYAQDPSNEAVLSTLASTYNALR, and (d) IQFELSTSQTGKDTLYEK.Deglycosylated samples were analyzed by ETD or EThcD by use of an inclusion list to fragment the most appropriate precursor ions (see Section Materials and Methods).The residual glycan of 500 Da at Ser or Thr highlighted in green indicates the glycosylation site in each of the four different sequences.All the c-ions and z-ions are 1 + charged and labeled in blue and red, respectively.The letter M indicates the precursor ion in each spectrum.