Correlating glycoforms of DC-SIGN with stability using a combination of enzymatic digestion and ion mobility MS

The immune scavenger protein DC-SIGN interacts with glycosylated proteins and has a putative role in facilitating viral infection. How these recognition events take place with different viruses is not clear and the effects of glycosylation on the folding and stability of DC-SIGN have not been reported. Here, we develop and apply a mass spectrometry-based approach to both uncover and characterise the effects of O-glycans on the stability of DC-SIGN. We first quantify the Core 1 & 2 O-glycan structures on the carbohydrate recognition and extracellular domains of the protein via sequential exoglycosidase sequencing. We then use ion mobility mass spectrometry to show how specific O-glycans, and/or single monosaccharide substitutions, alter both the overall collision cross section and the gas-phase stability of the glycoprotein isoforms of DC-SIGN. We find that rather than the mass or length of glycoprotein modifications, the stability of DC-SIGN is better correlated with the number of glycosylation sites. Collectively, our results exemplify a combined multi-dimensional MS approach, proficient in evaluating protein stability in response to both glycoprotein macro- and micro-heterogeneity and adding structural detail to the infection enhancer DC-SIGN.

While much current interest focuses on the glycosylation status of the viral spike protein of SARS-CoV-2, few studies have addressed the role of receptor glycosylation. DC-SIGN (the innate immune receptor dendritic cell-specific intercellular adhesion molecule-3 grabbing nonintegrin) has been implicated as an 'infection enhancer' in previous reports of coronavirus epidemics [1,2] . This property is attributed to the ability of DC-SIGN either to recognize self-or other pathogenic carbohydrates. As such DC-SIGN is proposed to play an unfavourable role in coronavirus infections, enhancing circulation of virions through multivalent interactions of high-mannose type viral glycans via its carbohydrate recognition domain (CRD) [3] . This ability to increase circulation of viral particles has prompted efforts to develop glycomimetic drugs for DC-SIGN, as CRD antagonists, to inhibit host-virus interactions and infection [4] . To the best of our knowledge however, DC-SIGN, a C-type lectin expressed on the surface of dendritic cells and macrophages, has yet to be reported as O-glycosylated and the extent and heterogeneity of these glycoforms has yet to be defined.
Here we use DC-SIGN as a challenging test case to develop and apply high-resolution native MS and ion mobility (IM) instrumentation to study the effects of glycosylation on the biologically relevant forms of the receptor. DC-SIGN is a type II membrane protein comprising three main domains: a cytoplasmic region, a transmembrane segment, and an extracellular domain (ECD). Although DC-SIGN is known to be tetrameric, enabling multivalent interaction with pathogens, a complete structure is not available; largely due to the intrinsic flexibility of the ECD. The ECD can be divided into two distinct regions: a neck region involved in tetramerization of the receptor and CRD, which mediates the molecular recognition processes.
A model of the ECD has been proposed, based on small angle X-ray scattering data [5] .
Our principal goal is to understand the impact of glycosylation on the stability of the DC-SIGN receptor by developing and applying a combined native MS approach to assign the overall glycan occupancy (macroheterogeneity) and to characterize its detailed structural information (microheterogeneity). These two types of data are generally obtained separately through orthogonal MS and liquid chromatography techniques. Here, we show that both macro-and microheterogeneity information can be acquired simultaneously through a single MS experiment using specific monosaccharide glycosidases in sequence. We further show, using IM and collision induced unfolding (CIU) measurements, that specific glycan structures, as well as the extent of their occupancy, affect the stability of the intact glycoprotein. Intriguingly we find that the overall mass of the glycan is less important in affecting the stability of DC-SIGN than the number of glycosylation sites. Taken together this approach therefore provides a means to gain both structural and biophysical information for this intact folded glycoprotein that is not accessible by other static or ensemble-based methods.
We began our investigation by expressing and purifying DC-SIGN CRD from human embryonic kidney (293T) cells. The native MS spectrum of this protein revealed two major charge states (8 + and 9 + ) with seven clear proteoforms within each distribution (Figure 1a).
The theoretical mass of non-glycosylated DC-SIGN CRD is 19128 Da. The lowest mass observed was 21014.5 ± 0.4 Da with additional peaks indicating the presence of further PTMs.
The mass differences between these peaks correspond to monosaccharides with distinct numbers of hexose (+162 Da), N-acetylhexosamine (+203 Da) and N-acetylneuraminic acid (+292 Da) residues. The difference of +1886.5 ± 0.4 Da is consistent with 2 hexoses (Hex), 2 N-acetylhexosamines (HexNAc) and 4 N-acetylneuraminic acid residues (Neu5Ac is referred to as sialic acid herein). Further mass shifts on the CRD domain are from Hex-HexNAc disaccharide additions plus a further 2 sialic acids. At this stage of the analysis HexNAc monosaccharides are unspecified (blue/yellow squares) as they can be either Nacetylgalactosamine and/or N-acetylglucosamine residues. We then recorded the native mass spectrum of the full ECD domain and observed a closely similar glycosylation pattern ( Figure   1b). We conclude that the CRD/ECD glycan modifications range between (Hex-HexNAc) [2][3][4][5][6] with either 4 or 6 sialic acids -the most glycosylated forms having 6 Hex, 6 HexNAc and 6 Neu5Ac monosaccharides (Figure 2a).
Since the CRD of DC-SIGN lacks an Asn-x-Ser/Thr sequon, the oligosaccharide PTMs we detected are unlikely to be N-linked glycans. The numbers of monosaccharides observed by MS are however characteristic of O-linked glycans, which typically are smaller than N-glycans, and occur on Ser/Thr amino acids. Covalent attachment of N-acetylgalactosamine is the first step in O-linked glycan biosynthesis, followed by addition of galactose, N-acetylglucosamine and sialic acid residues. We cannot infer any structural or occupancy information from observations made in Figure 1 as these values could arise from glycoforms comprised of six HexNAc residues, such as an extended Core 3 (Supplemental Figure 1). The next peak is 203 Da greater in mass and can be assigned to CRD consisting of one Core 1 and one Core CRD glycoforms with (Core 1)3, (Core 1)2(Core 2)1, (Core 1)1(Core 2)2 and (Core 2)3 structures. From this spectrum we can conclude that CRD has between two to three Core 1 and Core 2 O-linked glycans.
With this data alone, the assignment of Core 2 glycans cannot be definitive as these compositions are also equivalent to a Core 3 structure (GlcNAcβ1-3GalNAc-Ser/Thr). Although the presence of O-glycans on DC-SIGN has yet to be reported, there is considerable evidence supporting their occurrence. NetOGlyc, a neural network learning algorithm trained by proteome-wide discovery of O-glycosyaltion sites, predicted DC-SIGN to have three possible O-glycan sites (Ser 383, Ser393 and Thr398) [6] . These sites were also identified using  (Figure 3d) [7] . CCS measurements of CRD S/O with increasing numbers of glycans (1-3) revealed progressively larger structures (Figure 3c). With the addition of each O-glycan, the CCS increased in size by ~62 ± 2 Å 2 from the non-glycosylated CRD S/O . Changes in glycoprotein CCSs were therefore consistent and glycan-specific, pointing to the potential for IM to explore gas-phase structural variations of glycoproteins.
To explore further the ability of IM to probe these structural variations we compared CRD of similar mass but different glycosylation. The m/z of CRD (Gal-GalNAc)2 after sialidase digestion (termed CDR S ) is the same as CRD S/O with a single galactose-extended Core 2 Oglycan. As discussed above, the CDR S (Gal-GalNAc)2 glycoform has two Core 1 structures, meaning two glycan sites are occupied, as opposed to the single extended Core 2 glycan on CRD S/O . The IM-MS arrival times of these two glycoforms were noticeably different (~17%) revealing distinct CCSs (Figure 3e). This suggests Core 1 structures induce glycoprotein compaction (i.e. lower drift time, 3.8 ms) compared to the single extended Core 2 glycoform (4.1 ms), despite its decrease in mass.
CRDs with two additional monosaccharides (Hex-HexNAc)4 were subsequently examined (Figure 3e, right). For CRD S , a broad ATD was observed, attributed to the presence of multiple glycoforms. By contrast, the ATD for CRD S/0 , which has two Core 2 glycans, was narrow and more symmetric, implying fewer structures and consistent with a single proteoform. Furthermore, the ATDs increased with the addition of only 2 monosaccharides (from 3.8 to 4.1 ms for CDRS and 4.5 to 4.9 ms for CRD S/0 ) further highlighting the contribution of glycans to gas-phase protein structures. Considering the minor contribution of carbohydrate to the overall mass of the glycoprotein, which is 3.6% for CRDs with (Hex-HexNAc)2 and 7.1% with (Hex-HexNAc)4 monosaccharides, the difference in ATDs among isomeric glycoforms is apparent. The notion that sugars collapse fully onto the amino acid backbone and play only nominal roles in gas-phase structures is therefore inconsistent with our observations.
To assess how these changes in glycosylation affect protein stability we employed a protein unfolding approach using IM to follow CIU, induced by elastic collisions with a neutral gas in the mass spectrometer [8,9] . During a CIU experiment, the collision voltage is increased incrementally, and the protein undergoes unfolding through transition states of different CCSs.
The contribution of glycosylation to protein stability can then be compared, similarly as previously described with lipid binding to membrane proteins [10] . Here, CIU was used to explore stability effects due to glycan microheterogeneity (neutral vs. charged), glycan macroheterogeneity as well as isobaric glycoforms (i.e. CRDs with different carbohydrate compositions but equivalent masses).
We examined the CIU profile of four CRD glycoforms, generated by glycosidase digestion, (Figure 4a). A single well-defined transition was observed for each CIU experiment (Figure   4b). We noted however that the collision voltage at which glycoform unfolding occurred varied depending on the glycan composition and occupancy. Non-glycosylated CRDs (m/z 2128, 9 (m/z 2210, 19890 Da) unfolded at 23 V. The CRD with the greatest mass (m/z 2339, 21051 Da) assigned to two glycans (Hex2HexNAc2Neu5Ac4) had a similar unfolding pattern with a transition at 28 V. The equivalent CRD without sialic acid (2 Core 1 glycans; m/z 2210, 19890 Da) required significantly higher voltages to induce unfolding (37 V). In summary the greatest resistance to unfolding (stability) was observed for a CRD with two neutral O-glycans, followed by a glycoform with two negatively charged glycans followed by a CRD with a single O-glycan which was more stable than the apo protein (Figure 4c).
These results point to variations in biophysical properties arising from protein glycosylation and a possible role for sialic acids (negatively charged monosaccharide residues) in reducing stability and/or arrangements of glycoproteins in the gas-phase. Interestingly our results indicate that the number of glycans the contributing factor to glycoform stability over the length of glycans, in agreement with a previous in silico predictions [11] . Together these results highlight the potential of native IM-MS to investigate the stabilization effect of individual glycans which is challenging, if not unfeasible, using other methodologies.
Given the vast diversity of glycosylation patterns evident for eukaryotic proteins, it is likely that the effects of individual glycans will be highly specific to each individual protein. Nonetheless, native mass spectrometry has great potential to explore the correlations between stability and glycosylation for any protein, as well as uncovering micro-and macroheterogenity information in a single experiment. Combining ion mobility and exoglycosidases, of which are available, means analyses can be tailored accordingly. Here, we showed how this approach not only identified O-glycosylation of DC-SIGN but enable us to characterise the extent and effects of glycosylation. These results will lead to a greater understanding of its self-recognition and potential roles in enhancing viral infections, which in turn inform therapeutic targeting.    Non-denatured mass spectrometry for DC-SIGN. DC-SGIN CRD was buffer exchanged into 200 mM ammonium acetate, and immediately introduced into a modified Q-Exactive mass spectrometer (prototype Q-Exactive EMR) (Thermo Fisher) according to a previously reported method [12] . Overall a low voltage gradient was applied to transfer optics prior to trapping ions ions in the higher-energy collisional dissociation (HCD) cell. A low HCD activation voltage (15 V) was used for protein desolvation, in order to avoid protein unfolding and fragmentation of glycan post-translational modifications. For analysis of DC-SIGN ECD, protein sample was pre-treated with 1% acetic acid to dissociate protein complexes prior introducing into mass spectrometer. Spectra were acquired with five microscans and averaged with a noise level parameter of 3. Pressure in the HCD cell was increased (measured using UHV pressure ~1.05 x 10-9 mbar) to allow better trapping and transmission of protein ions. Data was analysed by using Xcalibur 2.2 SP1.48.
Ion mobility analysis for DC-SIGN. The collisional cross section (CCS) of DC-SIGN was measured using a modified SynaptG2-Si high definition mass spectrometer. Parameters used for analysis were the following: capillary, cone, trap and transfer collision energy were set at 1.3 kV, 50 V, 10 V and 7 V, respectively. The backing pressure was set at 6-8 mbar, and the pressure in the drift cell was set at 4.7 x 10-1 bar. The wave velocity and wave height for IMS cell is 500 m/s and 13 V whereas 248 m/s and 8V for Transfer cell. The drift time of four standard proteins (bovine serum albumin, pyruvate kinase, alcohol dehydrogenase, concanavalin A) were acquired under the same instrumental parameters for CCS calculation of DC-SIGN by a home-made software PULSAR [13] . Theoretical CCSs of DC-SIGN were calculated using the projection approximation method implemented in MOBCAL and scaled using a scaling factor of 1.14 [14] . For protein unfolding experiments, the drift time for each charge state of protein was obtained under a collisional energy ramp in CID cell with 5V intervals to determine the unfolding pathway of DC-SIGN. The initial position of each unfolding species was assigned, and its intensity was extracted across all collisional voltages to generate the unfolding model by PULSAR. The stabilization effect (eV) was calculated by the sum of differences of midpoint voltage for each unfolding species in comparison to the nonglycosylated apo CRD, multiplying by individual protein charge state in order to account for charge-dependent factor in protein unfolding.