Comprehensive mapping of O‐glycosylation in flagellin from Campylobacter jejuni 11168: A multienzyme differential ion mobility mass spectrometry approach

Glycosylation of flagellin is essential for the virulence of Campylobacter jejuni, a leading cause of bacterial gastroenteritis. Here, we demonstrate comprehensive mapping of the O‐glycosylation of flagellin from Campylobacter jejuni 11168 by use of a bottom‐up proteomics approach that incorporates differential ion mobility spectrometry (also known as high field asymmetric waveform ion mobility spectrometry or FAIMS) together with proteolysis with proteinase K. Proteinase K provides complementary sequence coverage to that achieved following trypsin proteolysis. The use of FAIMS increased the number of glycopeptides identified. Novel glycans for this strain were identified (pseudaminic acid and either acetamidino pseudaminic acid or legionaminic acid), as were novel glycosylation sites: Thr208, Ser343, Ser348, Ser349, Ser395, Ser398, Ser423, Ser433, Ser436, Ser445, Ser448, Ser451, Ser452, Ser454, Ser457 and Thr465. Multiply glycosylated peptides were observed, as well as variation at individual residues in the nature of the glycan and its presence or absence. Such extreme heterogeneity in the pattern of glycosylation has not been reported previously, and suggests a novel dimension in molecular variation within a bacterial population that may be significant in persistence of the organism in its natural environment. These results demonstrate the usefulness of differential ion mobility in proteomics investigations of PTMs.


Introduction
Campylobacter jejuni is the most prevalent foodborne bacterial agent of diarrhoeal disease in humans worldwide [1]. This zoonosis enters the human diet from poultry and farm animal sources in which the organism causes little pathology [2]. Disease in humans is usually self-limiting and relatively short-lived although symptoms can be severe, such attack [10]. A recent report demonstrated the involvement of flagellin glycan residues in phage receptor activity [11]. Hence there are strong selective pressures that may drive molecular variation in the flagellin molecule, with important implications in the control of infection.
Flagellins FlaA (major, ca. 90% of total) and FlaB, the major structural protein monomers of the flagellar filament, are proteins each of 572 amino acid residues totalling approximately 59 kDa, although apparent M r determined by SDS-PAGE is about 10% higher, due to the presence of numerous glycan residues: for an overview see [12]. The two proteins are 94% identical in amino acid sequence, with very few differences in the glycosylated non-terminal portion of the molecule that is known to be exposed on the surface of the assembled structure. We have therefore taken the sequence of FlaA as the basis for our observations and refer to it throughout as 'flagellin'. Although it has been shown in strains 81-176 [13,14] and NCTC 11168 [15] that a number of specific amino acid residues of flagellin may be O-glycosylated, it is unclear whether these modifications are always present or whether there is variation within the population of molecules in the nature of the glycan or the extent of glycosylation. Flagellin is rich especially in serine (approx. 11% of residues in strain 11168 and the most commonly modified residue) and also in threonine (approx. 6%) but many of these residues have not been shown to be targets for modification. It is not known how specific residues are selected for glycosylation or whether this process is tightly controlled or stochastic in nature.
MS was first applied to the analysis of glycosylation of flagellins from Campylobacter jejuni (81-176, 11168 and OH4384) and Campylobacter coli (VC167 T2) by Thibault et al. [16]. The major glycans identified were pseudaminic acid and derivatives thereof. Further work on C. coli (VC167) revealed the presence of legionaminic acid and derivatives [17]. Previously in our laboratory [18] we applied LC electron capture dissociation MS/MS (LC ECD MS/MS) to the analysis of glycosylation in flagellin from Campylobacter jejuni 11168, revealing that the protein was modified by dimethyl glyceric acid derivatives of pseudaminic acid and acetamidino pseudaminic acid at Ser181, Ser207 and either Thr 464 or Thr465. That work was in agreement with the work of Logan et al. who first identified these novel glycans [19]. Although the LC ECD MS/MS approach was successful in identifying the presence and site of glycosylation in flagellin from C. jejuni 11168, there was an extensive region of the protein (residues 387-463) for which no coverage was achieved. This observation is particularly salient as the homologous region in flagellin from C. jejuni 81-176 is heavily glycosylated [16]. Moreover, the region contains numerous potential sites of O-glycosylation, 21 serine residues and four threonine residues, affording the opportunity for differential glycosylation within the region. Any (non-tryptic) proteolysis within the region is therefore likely to result in a complex mixture comprising multiple glycopeptide isomers and isobars.
In order to address the limited flagellin coverage obtained with trypsin, we have applied an alternative protease for the analysis of flagellin, that is, proteinase K. Proteinase K has a broad specificity and cleaves peptide bonds that are Cterminal to aliphatic and aromatic residues [20]. It has been previously successfully applied to the analysis of glycoproteins [21,22]. Given the broad specificity of proteinase K and the likelihood of extensive glycosylation in the region of interest, we also incorporated gas-phase fractionation in the form of high field asymmetric waveform ion mobility spectrometry (FAIMS), in the workflow. FAIMS separates ions at atmospheric pressure on the basis of differences in their differential ion mobility. We, and others, have shown that FAIMS is useful in extending proteome coverage [23][24][25]. We have also shown that FAIMS is capable of separating glycopeptide isomers [26]. Here, we demonstrate comprehensive mapping of O-glycosylation in flagellin from Campylobacter jejuni 11168 by use of this multienzyme differential ion mobility approach.

Preparation of flagellin A
Campylobacter jejuni (NCTC 11168) cultures were grown on Mueller Hinton agar (Oxoid Ltd., Basingstoke, UK) and incubated in microaerophilic conditions (90% N 2 , 4% O 2 , and 6% CO 2 at 37ЊC) for 30 h. Cells were harvested from 13 separate agar plates and suspended in 5.7 mL of Luria-Bertani Broth. The suspension was homogenised at high speed (20 500 rpm) using an Ultra Turrax T-25 homogeniser, for 2 min, with 30 s rest, for four cycles in order to shear the flagellar filament from the bacterial wall. The homogenate was centrifuged at 7500 rpm for 20 min at 4ЊC to remove cells and cell debris. The supernatant was collected and treated with 1% Triton X (Bio-Rad), incubated at 37ЊC for 45 min, and subjected to ultracentrifugation (TL-100 Ultracentrifuge, TLA-100.3 fixed angle rotor, maximum RCF 541000 × g, Beckman, USA) at 50 000 rpm for 1 h at 4ЊC. The supernatant was removed immediately, tubes inverted on filter paper to remove residual fluid, and the resulting pellets were re-suspended in 100 L distilled water. A 5 L aliquot of the sample was analysed by SDS-PAGE, and visualised using Coomassie stain R250 (Bio-Rad) to check flagellar protein purity, see Supporting Information Fig. 1. A further 5 L was subjected to a Bradford protein assay to measure the protein concentration. The remaining sample was stored at -20ЊC until further analysis. Two biological replicates were considered.

Trypsin digestion
For LC MS/MS analysis, 10 L of sample (ß7 g of the extracted flagellar protein) was made up to 100 L in 100 mM of ammonium bicarbonate (Fluka Analytical, UK)) and denatured by incubating at 95ЊC for 5 min. Trypsin (Sigma Aldrich, Dorset, UK) (1 g/L in 50 mM acetic acid (Sigma Aldrich, Dorset, UK)) was added to the mixture to give 1:50 (replicate#1) or 1:7 (replicate#2) ratio of enzyme to protein, and incubated overnight at 37ЊC with mild shaking. Proteolysis was quenched by freezing at -20ЊC and the sample stored until further analysis. For LC FAIMS MS/MS analysis, 50 g of protein were digested using the digestion procedure as above.

Proteinase K digestion
Approximately 84 L of the sample (70 g of the extracted flagellar protein) was made up to 100 L in 100 mM of ammonium bicarbonate (Fluka Analytical, UK) and denatured at 95ЊC for 5 min. Proteinase K (Sigma Aldrich, Dorset, UK) (1 g/L in 50 mM Tris-HCl (Fisher Scientific, Loughborough, UK), 5 mM CaCl 2 (BHD laboratory, England)) was added to the mixture to give a 1:1 (replicate#1) or 1:100 (repli-cate#2) ratio of enzyme to protein, and incubated for 1 h (replicate#1) or 2 h (replicate#2) at 37ЊC with mild shaking. Proteolysis was quenched by freezing at -20ЊC and the sample stored until further analysis.
Both trypsin-and proteinase K-treated samples were desalted using C 18 column zip tips (Merck Millipore Ltd., Germany) (according to manufacturers' instructions). Briefly, the tips were wetted using 100% acetonitrile (J.T., Baker, Holland), and then equilibrated using 0.1% trifluoroacetic acid (Fisher Scientific Loughborough, UK). The samples were aspirated and dispensed for 10 cycles, and then washed three times using 0.1% TFA and eluted using 70:30 acetonitrile and 0.1% TFA. The desalted samples were dried and re-suspended in 10 L of 0.1% formic acid prior to MS analysis.

LC-MS/MS
Peptides were separated using online reversed phase LC (Dionex Ultimate 3000) using a binary solvent system consisting of mobile phase A (water (J.T., Baker, Holland)/0.1% formic acid (Fisher Scientific, Loughborough, UK)) and mobile phase B (acetonitrile, (J.T., Baker, Holland)/0.1% formic acid (Fisher Scientific Loughborough, UK)). Six microliters of the desalted samples were loaded onto a 150 mm Acclaim R Pepmap 100 C 18 column and separated by a 30 min gradient from 3.2 to 44% of mobile phase B, followed by 10 min at 90% of mobile phase B and 16 min with 3.2% mobile phase B to re-equilibrate the system. The LC was coupled to an Advion Triversa Nanomate (Advion, Ithaca, USA) that infused the peptides at a spray voltage of 1.7 kV. Peptides were infused directly into the Thermo Orbitrap Velos ETD mass spectrometer (Thermo Fisher Scientific) at a flow rate of 0.35 L/min.
The mass spectrometer alternated between a full FT-MS scan (m/z 380-1600) and subsequent CID and ETD of the four most abundant precursor ions. Survey scans were acquired in the Orbitrap with a resolution of 60 000 at m/z 400. Only multiply charged precursor ions were selected for MS/MS. The dynamic exclusion was used with a repeat count of 1 for 30 s. Automatic gain control (AGC) was used to accumulate sufficient precursor ions. AGC target value for FT-MS was 1 × 10 6 charges. CID was performed in the linear ion trap using helium at normalised collision energy of 35%. Width of the precursor isolation window was 2 Th. AGC target was 5 × 10 4 charges with maximum injection time of 100 ms. Charge state-dependant ETD was performed in the linear ion trap with activation time 100 ms. Isolation window was 3 Th. AGC target of ETD was performed with fluoranthene ions. AGC target was 5 × 10 4 charges with maximum injection time of 100 ms. Supplemental activation was enabled with activation energy of 25%. Data acquisition was controlled by Xcalibur 2.1 (Thermo Fisher Scientific).

FAIMS
The LC was coupled to the FAIMS device (Thermo Fisher Scientific), though a modified nanospray HESI II heated electrospray source, similar to that used by Swearingen et al. [25], incorporating a Picotip emitter needle (New Objective, Woburn, USA). The spray voltage was 3 kV. For FAIMS the dispersion voltage was -5000 V, inner and outer electrode temperatures were 70ЊC and 90ЊC, respectively. The gas flow was 2.

Data analysis
Raw data were loaded into Proteome Discoverer (version 1.4). Precursors with a mass of less than 350 Da and greater than 5000 Da were excluded and MS/MS spectra were required to have at least 1 peak. The CID and ETD spectra were separated and searched with the same parameters except for the fragment ions (b and y ions for CID and c, y and z ions for ETD). Data were searched against a manually created Campylobacter flagellin database (886 sequences) or a Campylobacter flagellin 11168 database (59 sequences) with sequences from the NCBI nr database. The data were searched using both the SEQUEST and Mascot algorithms (controlled through Proteome Discover version 1.4, Mascot version 2.4). In both SEQUEST and Mascot searches, the following parameters were used: precursor ion m/z tolerance 10 ppm; fragment ion tolerance 0.5 Da; enzyme-either trypsin or nonspecific enzyme (proteinase K); maximum Scheme 1. Structures of glycans identified in the protein database searches (based on findings in [19], [16], and [17]).
number of missed cleavages = 2; no fixed modifications; dynamic modifications as described below. Multiple searches of the data were performed with different combinations of dynamic modifications (maximum number of dynamic modifications per search = 6). The glycans considered were as described in [18].

Trypsin proteolysis and LC MS/MS
In our previous paper [18], we used a combination of trypsin proteolysis and LC ECD MS/MS. Here, we have used a CID-ETD method where each precursor ion is sequentially fragmented with CID and ETD. For comparison purposes, we first analysed the flagellin digest by LC CID ETD MS/MS without FAIMS. The data were searched against Campylobacter protein databases, using the Mascot and SEQUEST algorithms. A range of glycans were considered in the database searches as described in the experimental section. The glycans identified were dimethylglyceric acid derivative of pseudaminic acid (C 16  The combined protein sequence coverage obtained from the two replicates with CID was 76.2%. Eighteen glycopeptides were identified in the two database searches, however manual analysis of the mass spectra revealed that in each case fragmentation was poor and the peptide sequence coverage insufficient to assign sites of glycosylation. The combined protein sequence coverage obtained for ETD was 76.2%, see Fig. 1. Nine glycopeptides were assigned in the database search of the ETD data, but in this case, the fragmentation spectra were of sufficient quality to assign the modification sites in all but one case. Table 1 summarises the glycopeptides identified. (Unmodified peptides are summarised in Supporting Information Table 1). The ETD MS/MS spectra for the glycopeptides are shown in Supporting Information Fig. 2. Although the ETD mass spectra are better suited for localisation of sites of glycosylation, the CID mass spectra are useful in confirming the nature of the glycan: Each CID mass spectrum of a glycopeptide contains peak(s) corresponding to glycan oxonium ions, see below.
Seven of the glycopeptides identified were modified by ⌬m389 or by ⌬m390, five of which were at Ser181, Ser207 and Thr 464 or Thr465. These combinations of glycan and modification site were seen in our previous work. Two of the glycopeptides were glycosylated by these glycans at Ser345 and 349. These sites of glycosylation have not been observed previously. Two of the glycopeptides are modified by ⌬m315, one at Ser181 and one at Ser207. Modification of flagellin A from Campylobacter jejuni 11168 by this glycan has not been observed previously.
As observed previously, no sequence coverage was observed between amino acid residues 387 and 463. This region   of the protein has a high concentration of serine and threonine residues and may be heavily glycosylated, as observed in C. jejuni 81-168 [16]. The region is also characterised by a scarcity of lysine and arginine residues. Only two large peptides are predicted to result from trypsin proteolysis, one containing 11 and one containing 14 potential O-glycosylation sites.

Tryspin proteolysis and LC FAIMS MS/MS
Trypsin digests of flagellin were analysed by use of LC FAIMS MS/MS with the top-4 MS method. An 'external CV stepping' FAIMS method [23] was applied in which multiple LC MS/MS analyses are performed at different compensation voltages. The total combined protein sequence coverage obtained for ETD was 81.6%, see Fig. 1. Nineteen glycopeptides were assigned in the protein database search of the ETD data. Manual analysis of the data revealed the presence of 15 glycopeptides, summarised in Table 1. MS/MS spectra are shown in Supporting Information Fig. 3. (Unmodified peptides are summarised in Supporting Information Table 1). Seven glycopeptides containing Ser181 were identified, four of which were modified by ⌬m389 or ⌬m390, as observed previously [18]. The missed-cleavage peptide FETGGRISTS-GEVQFTLK modified with ⌬m315 was identified in the analysis conducted at CV = -50 V. Glycosylation by ⌬m315 of the same site in the fully tryptic peptide was also observed in this analysis (CV = -50 V) and in the analysis without FAIMS described above. The missed-cleavage peptide FETGGRISTS-GEVQFTLK was also identified as modified by ⌬m316 in the analysis at CV = -40 V. This modification of flagellin has not been observed previously in C. jejuni 11186.
Five glycopeptides containing Ser207 were identified, four of which were modified by ⌬389 or ⌬m390. The remaining peptide was modified by ⌬m315 at Ser 207. This combination of glycan/glycosylation site has not been observed previously. Finally, three peptides were identified which contained Thr464 and Thr465, modified by ⌬m389, ⌬m390 and ⌬m315. In all cases, although the ETD fragmentation was extensive, no cleavage between the two threonine residues was observed precluding unambiguous site localisation. As seen above, glycosylation of these site(s) by ⌬m389 and ⌬m315 are novel observations. The introduction of FAIMS into the workflow resulted in observation of novel glycans and combinations of glycan/glycosylation sites. Nevertheless, protein coverage between amino acid residues 388 and 463 was not achieved, presumably as a result of the use of trypsin as the protease.

Proteinase K proteolysis and LC MS/MS
To address the problem of the 'missing region', i.e. amino acid residues 388-463, an alternative protease-proteinase K -was considered. Two biological replicates were investigated. For the first, proteolysis conditions were enzyme/protein ratio 1:1, 1 h incubation (replicate #1). Subsequent experiments to determine the optimum conditions (to achieve maximum coverage) for proteolysis by proteinase  Twenty glycopeptides were assigned in the database search of the ETD data, however manual analysis of the ETD spectra, with cross-validation against the corresponding CID spectra, revealed 17 true assignments, see Table 2. (Unmodified peptides are summarised in Supporting Information Table 2. Note that no unmodified peptides were identified in replicate #1). ETD MS/MS spectra are shown in Supporting Information Fig. 4. One of the glycopeptides contained Ser181 modified by ⌬m390. Three contained Ser207 glycosylated with either ⌬m389 or ⌬m390. These observations support our earlier work [18]. One of the glycopeptides contained Thr465 modified by ⌬m390. In our earlier work and with trypsin and FAIMS-see above, we showed that either Thr464 or Thr465 was modified, but we were unable to unambiguously assign the glycosylation site. Six of the glycopeptides identified contained the novel glycosylation site Ser343 and were glycosylated with either ⌬m389 or ⌬m390. Two of the peptides contained site Ser343 glycosylated with ⌬m315. The remaining peptides derived from the region inaccessible via trypsin. Peptide [NYSTGFAN] was shown to be modified by ⌬m389 at Ser423. Peptide [GGYSSVSAY] was multiply modified by ⌬m315 and ⌬m390 or ⌬m389 at sites Ser395 and 298, respectively. Peptide [GSGFSSGSGYSVG] was multiply modified by ⌬m316, ⌬m315, ⌬m389 and ⌬m389 at sites Ser409, Ser410, Ser412 and Ser415, respectively.

Proteinase K proteolysis and LC FAIMS MS/MS
Proteinase K digests were analysed by LC FAIMS MS/MS ('external CV stepping') with the top-4 MS method. The combined protein sequence coverage obtained for ETD was 39.0%, see Fig. 1. Importantly, the sequence coverage in the 'missing region' was 81.5%. The combined coverage from the proteinase K and trypsin LC FAIMS MS/MS analyses was 94.1%. Forty-four non-redundant glycopeptides were assigned in the database search; however 37 non-redundant glycopeptides were confirmed by manual analysis and crossvalidation against corresponding CID mass spectra. See Table 3. (Unmodified peptides are summarised in Supporting Information Table 3. Note that no unmodified peptides were identified in replicate #1). Previously unobserved glycans or glycosylation sites are indicated with an asterisk. Representative ETD MS/MS spectra are shown in Fig. 2 and the remainders are shown in Supporting Information Fig. 5. Figure 2A shows the ETD mass spectrum of triply charged ions of multiply-glycosylated peptide GGYSSVSAY modified at Ser395 and Ser398 with glycans ⌬m315 and ⌬m389, respectively. This peptide is from the region of the protein inaccessible by trypsin digestion. The corresponding CID mass spectrum (shown inset) confirms the nature of the glycans. Oxonium ions are observed at m/z 316 and 390. This  peptide was observed at compensation voltages -45 V and -50 V (as 3+ precursor). Differential glycosylation of this peptide was observed, i.e. modification at Ser395 and Ser398 with glycans ⌬m315 and ⌬m390, respectively. That peptide was observed at CV = -45 V (3+ precursor) and -25 V (2+ precursor).
The ETD spectrum shown in Fig. 2B was obtained for quadruply charged multiply glycosylated YNVSAGSGFSS-GSTLSQF. The precursor ion mass suggests glycan combinations (i) ⌬m316, ⌬m316, ⌬m389 and ⌬m390; or (ii) ⌬m315, ⌬m316, ⌬m390 and ⌬m390. Given the well-established propensity for hydrogen transfer among ETD fragments (z, z+1, c, c-1), there is scope for ambiguity in the ETD mass spectrum. The corresponding CID mass spectrum reveals oxonium ions at m/z 316, 317, 390 and 391 suggesting both species are present; however, the relative abundances of the oxonium ions suggests that (ii) is the predominant species. It is not possible to confirm unambiguously the sites of the ⌬m315 and ⌬m316 glycans within the glycopeptide because of the potential for hydrogen transfer between ETD fragments. The glycans are sited on Ser448 and Ser454 (or a combination of the two). Figure 2C shows a third example of a multiply glycosylated peptide from the previously inaccessible region. Figure 3 (top) shows the number of glycopeptides identified at each of the various compensation voltages in replicate #2. The greatest number of identifications was achieved at a compensation voltage of -40 V. The distribution of glycopeptides identified according to charge state is shown in Fig. 3 (bottom). Doubly charged glycopeptides ions were observed between CVs of -20 and -30 V; triply charged glycopeptides between -30 and -55 V. Two 4+ glycopeptides ions were observed at -35 and -40 V. The majority of the identifications were of triply charged precursors.

Concluding remarks
We have demonstrated that comprehensive mapping of the O-glycosylation of flagellin from Campylobacter jejuni 11168 may be achieved by incorporating differential ion mobility spectrometry into the bottom-up proteomics workflow, together with use of both trypsin and proteinase K for proteolysis. A summary of the sites of glycosylation observed is given in Fig. 4. Novel glycans for this strain have been identified (pseudaminic acid and either acetamidino pseudaminic acid or legionaminic acid), as have novel glycosylation sites: Thr208, Ser343, Ser348, Ser349, Ser395, Ser398, Ser423, Ser433, Ser436, Ser445, Ser448, Ser451, Ser452, Ser454, Ser457 and Thr465. Multiply and differentially glycosylated peptides were observed: the identity of the glycan at modified amino acid residues was variable, and both presence and absence of glycan at specific residues was also observed. The observed heterogeneity in glycosylation patterns thus appears to confer a combinatorial element of biological variation in the flagellin, potentially advantageous to selective survival of members of the population in the face of biological attack. These results further demonstrate the usefulness of differential ion mobility in proteomics investigations of PTMs.