A Computational and Chemical Design Strategy for Manipulating Glycan‐Protein Recognition

Abstract Glycans are complex biomolecules that encode rich information and regulate various biological processes, such as fertilization, host‐pathogen binding, and immune recognition, through interactions with glycan‐binding proteins. A key driving force for glycan‐protein recognition is the interaction between the π electron density of aromatic amino acid side chains and polarized C─H groups of the pyranose (termed the CH–π interaction). However, the relatively weak binding affinity between glycans and proteins has hindered the application of glycan detection and imaging. Here, computational modeling and molecular dynamics simulations are employed to design a chemical strategy that enhances the CH–π interaction between glycans and proteins by genetically incorporating electron‐rich tryptophan derivatives into a lectin PhoSL, which specifically recognizes core fucosylated N‐linked glycans. This significantly enhances the binding affinity of PhoSL with the core fucose ligand and enables sensitive detection and imaging of core fucosylated glycans in vitro and in xenograft tumors in mice. Further, the study showed that this strategy is applicable to improve the binding affinity of GafD lectin for N‐acetylglucosamine‐containing glycans. The approach thus provides a general and effective way to manipulate glycan‐protein recognition for glycoscience applications.


Introduction
3][4] The functions of glycans are exerted mainly through specific molecular recognition between glycans and glycan-binding proteins (lectins).For example, the interaction between hemagglutinin proteins on influenza viruses and sialylated glycans on host cells initiates the colonization and infection process. [5,6]The binding of mammalian sperm to the egg is regulated mainly by the interaction between the egg-binding protein (EBP) located on the sperm plasma membrane and the carbohydrate sequence on the egg's zona pellucida (ZP). [7]10] Thus, glycan-protein interactions constitute a fundamental molecular mechanism that governs glycan-dependent biological functions.
Here, we employ computational modeling and molecular dynamics simulations to design a general chemical approach to increase glycan-protein recognition by enhancing the CH- interaction with the genetic code expansion strategy.Replacement of tryptophan residues in the glycan-binding site with electron-rich derivatives of tryptophan substantially increases the electrostatic potential of the indole ring, leading to enhanced CH- interactions.We demonstrate this approach by engineering high-affinity lectins capable of recognizing core fucosylated N-linked glycans and N-acetylglucosamine (GlcNAc)-containing glycans, respectively, and further apply them for sensitive detection and imaging of glycans.This study provides a powerful tool for genetically manipulating glycan-protein recognition and paves the way for further dissection of glycan-dependent biological functions.

Trp28 of Lectin PhoSL is the Most Critical Residue for Recognizing Core-Fucosylated Glycans
Core fucosylation, the addition of 1,6-fucose to the innermost GlcNAc residue of N-linked glycans, plays critical roles in regulating various physiological processes, including immune response, stem cell homeostasis, neuronal development, and tumorigenesis. [24,25]Accumulated studies demonstrated that core fucosylation is frequently upregulated in hepatocellular carcinoma, melanoma, breast and prostate cancer, and positively correlated with poor prognosis in patients. [24,26,27]Besides, core fucosylated serum proteins also serve as promising biomarkers for disease diagnosis. [28]Thus, it is important to develop strategies for the detection of core fucosylated glycans with high sensitivity.Pholiota squarrosa lectin PhoSL, isolated from a mushroom, specifically recognizes core-fucosylated N-glycans. [29]Notably, PhoSL is only composed of 40 amino acids, which makes the recombinant expression and genetic manipulation of PhoSL very convenient. [30]We then used PhoSL as a model system to investigate and manipulate the CH- interaction involved in recognizing core fucosylated N-glycans.A previously solved crystal structure of PhoSL in complex with a core fucosylated glycan ligand revealed several aromatic residues in the binding pocket, Tyr15 (Y15), Phe23 (F23), and Trp28 (W28), which interacted with core fucose and GlcNAc sugars [11] (Figure 1a).
To evaluate how these residues affect glycan recognition, we built atomistic models and performed all-atom molecular dynamics (MD) simulations in explicit solvent. [31]Two 5-microsecond simulations showed that several residues, including A1, V3, D11, G12, D13, L21, and V36, along with the aromatic residues, Y15, F23, and W28, were involved in the interactions with the glycans (Figure 1b).We also analyzed the CH- interactions by counting the coordination number between the aromatic ring centers of these residues and the CH groups of sugars.We found stable CH- contacts between the indole ring of W28 and the pyranose ring of GlcNAc, and between the benzene ring of F23 and the fucose ring (Figure 1c).We observed that W28 formed more CH- contacts than F23 (Figure 1d).Moreover, Y15 exhibited weaker and less frequent CH- contacts than F23 and W28.Thus, the simulations reveal key roles for aromatics F23 and especially W28 in CH--mediated glycan recognition.
To further evaluate the role of F23 and W28 experimentally, we introduced alanine mutations at these sites in PhoSL.The mutant proteins were expressed and purified from E. coli.We then measured their binding affinity toward the core fucose glycan substrate 1 using microscale thermophoresis (MST) (Figure S1, Supporting Information).Compared to the wildtype (WT) PhoSL, which had a dissociation constant (K D ) of 1.52 ± 0.21 μm, the F23A mutant displayed a three-fold weaker affinity with a K D of 6.73 ± 3.53 μm.An even more substantial drop of ≈90fold was observed for the W28A mutant, which had a K D of 141.64 ± 16.47 μm (Figure 1e).This significantly reduced binding of both mutants indicates that F23 and W28 are critical for the recognition of the glycan substrate.Moreover, the greater effect of the W28A mutant compared to F23A is consistent with MD simulations that predicted a larger contribution of CH- contacts from W28 versus F23.Taken together, the alanine mutagenesis experiments and simulations converge to demonstrate a key role for W28 in glycan binding affinity.

Substitution of W28 with Electron-Rich Tryptophan Derivatives Enhances Glycan-Protein Recognition
CH- interactions are important for glycan-protein recognition, as they involve electropositive C─H bonds of glycans and the  systems of aromatic amino acid side chains.We wondered if increasing the electronic density of aromatic rings would boost CH- interactions and the glycan binding affinity.Since W28 is more sensitive to modification than F23, we decided to choose W28 as a better candidate for manipulation.We hypothesized that the electron-rich substitution of W28 would result in a stronger attractive force for glycan-protein recognition.
To test this, we first obtained a series of electron-rich Trp derivatives from commercial sources and chemical synthesis (Figure 2a).We then used the genetic code expansion system to replace W28 of PhoSL with these derivatives, as previously reported. [32]The genetic code expansion strategy utilizes orthogonal translation systems (OTSs) to introduce PTMs or their analogs in the form of noncanonical amino acids into proteins in a site-specific manner.The codon corresponding to W28 was mutated to the amber suppressor codon TAG followed by co-expression of the pyrrolysyl-tRNA and pyrrolysyl-tRNA synthetase pair in E. coli BL-21 cells.Here, the pyrrolysyl-based system was previously modified with components from the phenylalanine synthetase, allowing highly efficient incorporation of Trp derivatives in both prokaryotes and eukaryotes. [33]The corresponding Trp derivative was added to induce protein expression.The resulting proteins were purified by GST tag fusion at the C-terminus of PhoSL to obtain high homogeneity (Figure 2b).To verify the efficient incorporation of Trp derivatives, we performed site mapping of PhoSL substituted with 5-methyl-Trp or 5-methoxy-Trp by tandem mass spectrometry analysis (Figure S2a-c, Supporting Information).These protein variants were further analyzed to probe whether the incorporated Trp derivatives affected glycan-protein interaction.Notably, the binding affinity of PhoSL variants with 5-methoxy, 6-methoxy, 7-methoxy, or 6,7dimethoxy Trp for the glycan substrate increased by 2.7-fold, 1.6fold, 1.8-fold, or 2.1-fold, respectively, compared to the WT PhoSL (Figure 2c).In contrast, the binding affinity of PhoSL variants with 5-methyl, 6-methyl, or 7-methyl (mild electron-rich derivatives) only changed slightly compared to the WT PhoSL (Figure S3, Supporting Information).These results indicate that the CH-  interactions between the glycan substrate and PhoSL can be enhanced by substituting with electron-rich derivatives of W28.
To further investigate the molecular basis of increased recognition by the 5-methoxy substitution, we performed density functional theory calculations to compare the electrostatic potential surfaces of Trp and 5-methoxy Trp.The result clearly showed that the addition of an electron-rich methoxy group boosts the indole ring's electrostatic potential (Figure 2d).Next, we performed MD simulations of PhoSL trimer with 5-methoxy substituted W28 (W28 mut ) in complex with glycans and compared the results with that of the WT PhoSL-glycan complex (Videos S1 and S2, Supporting Information).The MD trajectories showed that W28 mut indeed made the CH- contacts more stable than the WT (Figure 2e).The substitution also increased the average number of CH- contact of W28 from 1.6 to 1.9 (Figure 2f).In contrast to the WT, in which W28 interacted with both GlcNAc1 and GlcNAc2 of the glycan, W28 mut in the mutant protein mainly interacted with the core GlcNAc1 (Figure 2g).These results are consistent with the experimental binding data, suggesting that the 5-methoxy substitution of W28 strengthens the CH- interactions between the glycan substrate and PhoSL, resulting in a higher binding affinity.

W28-substituted PhoSL increases sensitivity for detecting and imaging core fucosylated glycans in vitro and in vivo
Lectins have been used for detecting and imaging glycans.However, their relatively low binding affinity toward glycans has hampered the application. [34,35]With engineered PhoSL variants of higher binding affinity in hand, we aimed to explore the utility in glycan detection and imaging.The WT, W28 mut , and W28A PhoSL were individually conjugated with biotin.Cell lysates from the human liver cancer cell line HepG2 were separated by SDS-PAGE, transferred to nitrocellulose membranes, and incubated with biotinylated PhoSL variants.Subsequently, membranes were incubated with horseradish peroxidase (HRP)labeled streptavidin, and detected by chemiluminescence imaging.Notably, a significantly higher detection signal was observed with the W28 mut PhoSL, compared to the WT PhoSL (Figure 3a).As a negative control, a much lower signal was shown with the W28A PhoSL.
Next, we explored the imaging of cell surface core fucosylated glycans in live cells.We conjugated PhoSL variants with FITC, and validated that PhoSL specifically recognizes core fucosylated glycans on HepG2 cell surface by competitively abolishing the fluorescent signal with core fucosylated glycan substrate 1 (Figure S4, Supporting Information).We observed that the W28 mut PhoSL produced the highest fluorescent signal, while the W28A PhoSL produced the weakest signal (Figure 3b).This is consistent with the results obtained from the Western-blotting analysis.
We further investigate the imaging of core fucose glycans in vivo.Core fucosylated glycans have been reported to be overexpressed in various types of tumors, which is closely related to tumor growth and metastasis. [36,37]Imaging of core fucosylated glycans in live animals has largely been unexplored.To do this, we injected 4T1 cells (the murine breast cancer cell line) into nude mice via the tail vein to establish the lung metastasis model. [38]wo weeks after the injection, we injected the FITC-conjugated WT, W28 mut , or W28A PhoSL via the tail vein to image core fucosylated glycans present in pulmonary metastatic nodules.One hour after the injection, we anesthetized and imaged the mice by the small animal live fluorescence imaging system.As shown in Figure 3c, the W28 mut PhoSL injection resulted in a 2.5-fold increase in the fluorescence signal compared to the WT PhoSL.The negative control, injection of the W28A PhoSL only showed the background fluorescent signal.After we euthanized and dissected the mice, we found many tumor nodules in the lung tissues.The fluorescent signals were mainly localized in metastatic nodules, indicating the specificity of PhoSL.Consistently, labeling with the W28 mut PhoSL produced the strongest signal compared to the WT or W28A PhoSL (Figure 3d).Taken together, these data suggest that the W28 mut PhoSL is a superior tool for the sensitive detection and imaging of core fucosylated glycans both in vitro and in vivo.

Spatial Profiling of Core Fucosylated Proteins and Interacting Partners with W28-Substituted PhoSL
Core fucosylation has been demonstrated to play critical roles in both normal physiology and disease development. [24,25]Core fucosylation exerts the biological function not only by modifying protein substrates but also by coordinating with the neighboring interacting proteins.Thus, systemic profiling of core fucosylated proteins and the interacting partners will provide valuable insight into the underlying molecular mechanisms.We envision that proximity labeling strategies using an engineered biotin ligase (TurboID) could offer a powerful tool to profile core fucosylated proteins and the interacting network. [39]In addition, our substituted PhoSL with stronger glycan recognition is expected to achieve a better performance and high signal-to-noise ratio in the proximity labeling.Thus, we genetically fused the biotin ligase TurboID with the W28 mut and W28A PhoSL to test their capability to identify core fucosylation interactomes in live cells (Figure 4a).HepG2 cells were incubated with various concentrations of fusion proteins, and the biotinylation reaction was initiated by the addition of biotin and ATP and allowed to proceed for 30 min before quenching.Subsequent installation of streptavidin-Fluor 488 showed that efficient labeling was achieved on the cell surface with the optimal concentration of the fusion W28 mut PhoSL at 4 μm (Figure 4b).In contrast, cell surface labeling with the W28A PhoSL only produced the background signal (Figure 4b).three independent assays.d) The electrostatic potential surfaces for both tryptophan and its electron-rich derivative, 5-methoxy-tryptophan. Molecular surfaces with negative charges and positive charges are colored in red and blue, respectively.e) The CH- contact trajectories of W28 in WT and 5methoxy-tryptophan mutant (Mut, W28 mut in the text) PhoSL from MD simulations.f) The frequency distribution of the CH- contacts of W28 in the WT and mutant PhoSL from two independent MD simulations for each system.g) The depiction of PhoSL-glycan interactions.The interactions between PhoSL residues and the glycan residues are indicated by lines, and their thickness is linearly scaled with the frequency of the corresponding interaction in MD simulations.Using the optimal labeling condition, we next performed enrichment of biotinylated proteins with streptavidin-bound beads, on-bead proteolytic digestion, and protein identification with liquid chromatography coupled mass spectrometry (LC-MS) (Figure 4c).We compared cells treated with the W28 mut PhoSL-TurboID and the W28A PhoSL-TurboID to identify proteins predominantly mediated by the glycan-binding effect.To achieve a high-confidence assignment of identified proteins, the digested peptides were isotopically derivatized by NaCNBH 3 or NaCNBD 3 mediated dimethyl labeling before MS analysis. [40]Only proteins with a fold change >5, p-value <0.01 (Student's t-test), and at least two identified unique peptides in at least two parallel tests were considered as core-fucosylated proteins and the interacting partners.Using these criteria, we identified 451 proteins in HepG2 cells (Figure 4d; Table S1, Supporting Information).[43] Cellular localization analysis showed a majority (71.4%) of identified proteins were located on the plasma membrane.Gene ontology analysis of cellular components (GOCC) also confirmed that a large number of identified proteins were associated with the cellular membranes (Figure 4e).Gene ontology analysis of biological processes (GOBP) revealed that these identified proteins played critical roles in cell adhesion, integrin-mediated signaling pathways, amino acid transmembrane transport, and cell-cell fusion (Figure 4e).Gene ontology analysis categorized by molecular Function (GOMF) further showed that the identified proteins were closely related to the binding of phosphatidylinositol, integrin, and cadherin (Figure 4e).The similar result was observed by enrichment analysis of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (Figure 4f).To further understand the cellular processes of the identified proteins, we constructed a protein-protein interaction (PPI) network using the STRING database and found that the most significantly enriched cluster was cell adhesion (Figure 4g).Together, these results accord well with the function of core-fucosylation involved in cell-cell communications and provide useful information for further mechanistic understanding of core-fucose-mediated cellular processes.

A similar Strategy Applies to Increasing GafD Recognition of GlcNAc-Containing Glycans
To demonstrate the broad applicability of our strategy for enhancing glycan-protein recognition by optimizing CH- interactions, we extended this approach to GafD, a lectin that specifically Proteins with a fold change >5 and p value <0.01 were considered as core fucosylated proteins and the interacting partners and are highlighted in red.e) Gene ontology analysis of identified glycoproteins categorized by biological processes (GOBP), cellular components (GOCC), and molecular Function (GOMF).f) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of identified glycoproteins.g) Protein-protein interaction (PPI) network analysis of identified glycoproteins using the STRING database.binds to N-linked and O-linked glycans with a terminal GlcNAc sugar. [44,45]Among the four Trp residues in GafD (W46, W85, W102, and W109), the previously solved crystal structure revealed that the latter three are solvent-exposed and contribute to two putative glycan binding sites near W102 and W109 (Figure S5a, Supporting Information). [46]To identify the sugar-binding site, we performed atomistic MD simulations of GafD with GlcNAc sugars and observed numerous dynamic glycan binding events.These simulations revealed that W109 formed more stable CH- interactions with GlcNAc compared to W102, identifying it as the optimal residue for modification (Figure S5b, Supporting Information).
Having identified W109 as the optimal site for enhancement, we hypothesized that replacing it with a more electron-rich tryptophan derivative could strengthen favorable CH- interactions with GlcNAc.Indeed, genetically incorporating 5-methoxy Trp at W109 increased GafD's binding affinity for the GlcNAc glycan substrate by 2.3-fold compared to the WT (K D of 0.66 ± 0.23 μm vs 1.40 ± 0.48 μm).In contrast, the alanine mutant W109A exhibited a 4.2-fold reduction in affinity (K D of 5.86 ± 1.21 μm), highlighting the importance of W109 (Figure 5a,b).Validation via Western-blotting analysis of cell lysates and fluorescent imaging of live cells showed W109 mut yielded the strongest detec-tion signal in both assays, indicating the enhanced glycan binding (Figure 5c,d).Tumor cells are coated with sialic acids at the termini of glycans.To rule out the effect of terminal sialic acids for glycan-protein recognition, we used sialidases to remove sialic acids on the cell surface and detected the sialylation using a well-established chemoenzymatic labeling strategy mediated by a specific glycosyltransferase CgtA. [47]Treatment with sialidases almost abolished the fluorescence signal of sialylation, but had no apparent impact on the recognition of W109 mut for GlcNAc, indicating the binding between GafD and GlcNAc regardless of sialic acids (Figure S6a,b, Supporting Information).We also labeled and imaged the terminal GlcNAc moieties on the cell surface using GalT1 Y289L mediated chemoenzymatic labeling method coupled with bioorthogonal reactions. [48]The result showed that the fluorescence signal detected by chemoenzymatic labeling strategy displayed a comparable level as that detected by W109 mut (Figure S6c, Supporting Information).Together, these results demonstrate the power and broad applicability of CH- interaction enhancement through genetic code expansion as a generalizable approach to improve glycan-protein recognition.This underscores the versatility of optimizing CH- interactions for improving glycan recognition across diverse proteins.

Discussion
Glycans contain important biological information that is frequently decoded through specific interactions with glycanbinding proteins (lectins).These interactions can trigger cellular signaling pathways, modulate cell-cell communications, and reprogram cell behaviors. [4,9]Recent studies have also highlighted links between aberrant glycan-protein interactions and disease development/progression. [49,50] Given their glycan specificity, lectins have now been used as a valuable tool to probe glycan structure and image glycans.However, most known glycanprotein interactions have relatively weak affinity, with K D values in the sub-mM range. [51,52]This low binding affinity has hindered the application for glycan detection and imaging in cells.
Here we develop a versatile approach to increase glycan-protein interactions by genetically incorporating unnatural amino acids (electron-rich Trp derivatives) into the glycan binding site.Singlesite incorporation significantly increased the binding affinity by two to three-fold, enabling the detection and imaging of glycans with higher sensitivity.Since multivalency is known to enhance avidity between glycans and lectins, [16,20] multivalent presentation of these modified lectins could further improve glycan recognition.Moreover, because the genetic incorporation occurs inside cells, the enhanced recognition is poised for living cell imaging of glycan localization and trafficking.It may also provide opportunities to modulate cellular functions dependent on glycan recognition.
Current strategies for glycan detection and imaging consist of lectin binding, metabolic labeling, and chemoenzymatic labeling.These strategies vary in specificity, sensitivity, and applications, and are complementary to each other.Notably, while chemoenzymatic labeling has exceptional specificity and sensitivity in glycan detection, its translation to in vivo imaging remains unattainable.On the other hand, the metabolic labeling exhibits a broad range of in vitro and in vivo applications whereas the specificity of the labeling is unsatisfactory due to the intricate metabolic milieu within living systems.Our strategy, which engineers lectins to possess a higher binding affinity to glycans, not only achieves the detection sensitivity comparable to the chemoenzymatic labeling but also enables in vivo imaging of glycans.Therefore, it provides a promising tool for glycan imaging and detection both in vitro and in vivo.
CH- interactions are a prominent feature of glycan-protein recognition, being present not only in glycan-lectin binding but also in the transition state stabilization during glycosyl-transfer and glycosyl-hydrolysis reactions. [18,20]In these reactions, the cationic or cationic-like anomeric carbon is stabilized by the aromatic  orbitals in the transition state.Thus, our approach to strengthening CH- contacts and enhancing glycan-protein recognition through genetic incorporation of electron-rich aromatic amino acids may have important implications for designing improved enzyme catalysts for glycan synthesis and probing reaction mechanisms.[55] The ability to genetically manipulate the  system thus provides an unprecedented opportunity for advancing our understanding of biological recognition.Overall, this work establishes the CH- interaction as a flexible target for modulating glycan-protein binding and provides a plat-form to investigate the broader impacts of aromatic residues on glycan-focused enzymes and biomolecular recognition.
Plasmid Construction: The plasmid pBK bearing the chPheRS, and the plasmid pNEG carrying the chPheT were provided by Prof. Shixian.Lin (Zhe Jiang University).The protein sequence of PhoSL (A0A384E107-1) and GafD (Q47341) was obtained from the UniProt database and the gene sequences were synthesized by SunYa.PhoSL or GafD and GST tag were inserted into pNEG-chPheT to generate pNEG-chPheT-PhoSL or GafD-GST.Site-directed mutagenesis of PhoSL and GafD was performed by PCR.Primers for site-directed mutation study are listed below.

Primers
Sequence 5'→3' Protein Expression and Purification: For expression of wild type and W28A PhoSL, the corresponding plasmid pNEG was transformed into DH10B cells.At OD600 ≈0.8, 0.2% arabinose and 0.1 mM ZnCl 2 were added into the medium to induce protein expression and sequentially cultured for 20 h at 16 °C.For PhoSL variants incorporated UAAs, the plasmid pNEG bearing PhoSL with an amber codon at the site of W28 and the plasmid pBK bearing corresponding chPheRS were co-transformed into the DH10B cells in LB medium supplemented with 50 μg mL −1 kanamycin and 100 μg mL −1 ampicillin.At OD600 ≈0.8, 0.2% arabinose, 0.1 mm ZnCl 2 and 2 mm corresponding tryptophan derivatives were added into medium to induce protein expression.After culture for 20 h at 16 °C, cells were then harvested by centrifugation at 10 000 g for 5 min and suspended in lysis buffer (20 mm Tris-HCl, pH 7.4, 150 mm NaCl, and 2 mm -Me).The cells were broken by ultra-sonication with 40% power for 10 min (3 sec on and 7 sec off) on ice and centrifuged at 15 000 g for 30 min to remove precipitants.The supernatants of lysates were incubated with GST-tag purification resin (Beyotime) for 2 h at 4 °C and subsequently loaded onto an affinity chromatography column.After washing with lysis buffer, the proteins were eluted with the elution buffer (50 mM Tris-HCl, 150 mm NaCl, 10 mm GSH, pH 8.0).Purified proteins were further concentrated using a 10 kD AmiconH Ultra Centrifugal Filter Unit (Millipore, Ireland), and then desalted with MST buffer (PBS buffer, 0.05% Tween-20, pH 7.4).Proteins were then analyzed by 10% SDS-PAGE.The protein concentration was determined by coomassie staining.
For the purification of GafD, the bacteria were collected and resuspended in 10 mL of PBS and sonicated thoroughly on ice.Then the lysate was diluted to 80 mL with PBS and centrifuged at 14 000 g for 30 min to discard the supernatant.The pellet was resuspended in PBS containing 25%(w/v) sucrose and centrifuged at 14 000 × g for 30 min to discard the supernatant.After resuspending with PBS containing 25% (w/v) sucrose for five times, the pellet was redissolved in 20 mL of 5 m guanidine hydrochloride (Gu-HCl) with 0.3 m sodium sulfite at room temperature.2 mL of 2-nitro-5-(sulfothio)-benzoate (NTSB) was added into reaction until the solution turned pale yellow.The protein was precipitated after the addition of 180 mL H 2 O and centrifuged at 10 000 × g for 10 min.Then the pellet was resuspended in 5 m guanidine hydrochloride (Gu-HCI) with a concentration of 1 mg mL −1 .The protein solution was diluted ten folds and dialyzed three times in the folding solution (0.7 m L-arginine, 50 mm Tris-HCl, 5 mm EDTA, 4 mm cysteamine, and 2 m cystamine, pH 8.0) at 4 °C.After dialysis, the precipitated protein was removed by centrifugation and purified proteins were obtained in the supernatant.The proteins were further concentrated, desalted, and analyzed by 10% SDS-PAGE.
To a solution of compound 6 (500 mg, 0.9 mmol) in CH 2 Cl 2 (2.5 mL) and MeOH (2.5 mL) was added ethylenediamine (600 μL, 9.0 mmol) in one portion.The mixture was stirred at room temperature for 2 h.The solvent was removed by rotary evaporation, and the residue was purified by flash column chromatography on silica gel (DCM/MeOH = 10:1) to give compound 7 (350 mg, 90%) as a yellow oil. 1  Magnesium powder (240 mg, 10 mmol) and compound 7 (216 mg, 0.5 mmol) were added to a round-bottomed flask under the protection of argon.The anhydrous methanol was added to the bottle through a syringe, and the mixture was stirred vigorously at room temperature for 2 h.The solvent was removed by rotary evaporation, and the residue was purified by flash column chromatography on silica gel (DCM/MeOH = 10:1) to give compound 8 (116 mg, 84%) as a yellow oil. 1  Compound 8 (110 mg, 0.4 mmol) was dissolved in MeOH (10 mL), and NaOH (48 mg, 1.2 mmol) in H 2 O (10 mL) was added.The mixture was heated under reflux for 8 h.After completion of the reaction, the mixture was cooled to room temperature.Then, the solvent was removed by rotary evaporation, and the crude product was further purified by Bio-Gel-P2 (water) to give compound 1 (97 mg, 92%) as a white crystal. 1  Detection of Trp Derivative-Incorporated PhoSL Variants by LC-MS: Purified PhoSL-GST was freeze-dried and resuspended in 50 mm NH 4 HCO 3 (pH 8.0).The mixtures were digested by Glu-C with shaking at 37 °C for 4 h, quenched by TFA acidification, desalted, and freeze-dried for further LC-MS analysis.Freeze-dried samples were dissolved in solution A (0.1% formic acid in water) and trapped onto a homemade 150 μm × 20 mm C18 precolumn automatically by a Thermo EASY nLC 1200 system (Thermo Fisher Scientific).The bound peptides were then separated by a homemade 150 μm × 15 cm C18 analytical column over a 30 min gradient using solution A and solution B buffers (0.1% formic acid in 80% acetonitrile).For PRMs, the Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific) was configured to collect MS/MS scans corresponding to several precursor targets in the Tryptophan28 (W28)-containing sequence (Table S1, Supporting Information).Full MS was acquired from 300-1400 m z −1 (120 000 resolution, 50ms maximum inject time, 5e5 AGC target), and MS/MS was collected at a resolution of 15 000, maximum inject time of 30 ms, and AGC target of 5e4.The normalized collision energy of 30.
Microscale Thermophoresis (MST): Protein-glycopeptide interactions were analyzed by microscale thermophoresis (MST).Glycopeptides were first labeled with fluorescein isothiocyanate (FITC).FITC -glycopeptide (50 nm) was mixed up with the gradient dilution of the protein in MST buffer (PBS buffer, 0.05% Tween-20, pH 7.4).The samples were siphoned into the glass capillaries (Nano Temper Technologies, cat#MO-K022) and MST was performed in an NT.115 Monolith instrument (Nano Temper Technologies, Munich, Germany) using a Blue LED with 20% blue LED excitation power, 40% IR laser power for excitation in three independent replicates at 25 °C.The dissociation constants (K D ) were calculated by K D model with a 1:1 stoichiometry per binding partner by MO Affinity Analysis Software.All data were processed by the GraphPad Prism 8 software.
Neighbor searching was performed every 20 steps in the MD simulations.The PME algorithm was used for electrostatic interactions with a cut-off of 1.2 nm.A reciprocal grid of 64 × 64 × 64 cells with 4th order B-spline interpolation was used.A single cut-off of 1.2 nm was used for Van der Waals interactions.The V-rescale algorithm was used for temperature coupling.Temperature and pressure were kept constant at 300 K.All covalent bonds with hydrogen atoms of the protein and water molecules were constrained by the LINCS algorithm.Each system was minimized for 1000 steps, then equilibrated 1 ns in an NPT ensemble with position restraints on all heavy atoms of the complex.The hydrogen mass repartitioning technique was employed with a single LINCS iteration (expansion order 4), which allowed an integration time step of 4 fs to be used.The productive simulations were performed in NVT ensemble.
All MD simulations were performed using a GPU-accelerated version of Gromacs 2021.5.Two 5000 ns production runs were carried out for the PhoSL systems.For GafD, two 1000 ns runs were performed.Molecular structures were visualized using Pymol and protein-glycan interactions analyzed with GetContacts (https://getcontacts.github.io/)and PLUMED [60] and visualized using R. To specifically analyze CH- interactions, the coordination number between hydrogen atoms of glycan CH groups and the center of aromatic rings for aromatic residues was calculated.Note that Trp has an extra five-member ring, different from Tyr and Phe which only have one six-member ring.A switching function was used as follows to define the CH- coordination number: where c is 1 if the contact between hydrogen atom i and the ring center j is formed, or 0 otherwise.The cutoff r 0 was set to 0.3 nm.The exponents n and m were set to 6 and 12, respectively.
To explore the effects of replacing tryptophan residues with electronrich derivatives on the indole ring's electrostatic potential, Density Functional Theory (DFT) quantum mechanical calculations were conducted using the Gaussian16 software with the B2LYP functional and 6-311G+(2d,p) basis set.Visualization of the electrostatic potential surfaces was achieved using GaussianView 6.
Constructions of FITC or Biotin-Conjugated PhoSL and GafD: FITC or biotin-coupled PhoSL and GafD were produced using FITC conjugation kit (Sangon Biotech, D601049) and a Biotin conjugation kit (Sangon Biotech, D601048), respectively.Briefly, PhoSL or GafD proteins were desalted in PBS and incubated with FITC or activated biotin at 37 °C for 90 min.The coupling proteins with high purity can be obtained by desalting column centrifugation.
Western Blotting Analysis: Cells were lysed in RIPA buffer (50 mm Tris (pH 7.4), 150 mm NaCl, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS) containing protease inhibitor cocktail (Roche).The protein concentration was determined by BCA Protein Assay Kit (Beyotime).Approximately 50 μg of protein lysate was resolved on a 10% SDS-PAGE gel, transferred to a nitrocellulose membrane, and incubated with Biotin-PhoSL at 4 °C overnight.After washing with PBST buffer for three times, the membrane was incubated with HRP-labeled Streptavidin (1:5000) for 1 h and detected by Tanon 5200 chemiluminescence imaging analysis system.The intensity of protein bands was quantified using Image J.
Detection of Cell-Surface Core Fucose on Living Cells by Fluorescence Microscopy: Fut8 overexpressing HepG2 cells were seeded and cultured into coverslip for 24 h.After washing three times with PBS, cells were incubated with FITC-PhoSL in the labeling buffer (1% FBS, 10 mM HEPES pH 7.9 in PBS) for 2 h at 37 °C.Cells were washed three times with PBS, and fixed with 4% paraformaldehyde.Then cell nuclei were stained with DAPI in PBS for 30 min at 25 °C.Coverslips were washed three times with PBS and plated into a glass slide.The cells were imaging by using a laser scanning confocal microscope (FV3000).
Chemoenzymatic Labeling of Sialylation and Terminal GlcNAc Moieties Glycoproteins on Cell Surface: HepG2 cells were seeded and cultured into a coverslip for 24 h.After washing three times with reaction buffer (PBS containing 3% FBS), cells were incubated with sialidase (1 U uL −1 ) in reaction buffer for 30 min at 37 °C to remove sialic acids.The reactions with the absence of sialidase were performed as parallel.For the labeling reaction of sialylation, 50 um of UDP-GalNAz, 10 mm Mg 2+ and 20 ug mL −1 CgtA were added into the labeling buffer at 37 °C for 1 h.For labeling of terminal GlcNAc moieties, cells were incubated with UDP-GalNAz (100 μM) and GalT1 (20 ug mL −1 ) in the labeling buffer (1% FBS, 10 mM HEPES pH 7.9 in PBS) for 2 h at 37 °C.After washing three times with the labeling buffer, the cells were incubated with 30 μm DBCO-PEG4-biotin at 25 °C for 1 h.Followed by washing three times with labeling buffer, cells were incubated with streptavidin-Alexa Fluor 488 in PBS containing 1% BSA for 30 min at 25 °C.Cells were washed three times with PBS, and fixed with 4% paraformaldehyde for 10 min.The nucleus was labeled with DAPI in PBS for 20 min at 25 °C.Coverslips were washed three times with PBS and plated into a glass slide.The cells were imaged by using a laser scanning confocal microscope (FV3000).
Imaging of Core Fucosylation in Mouse Lung Metastasis Model: Mice experiments were authorized by the Institutional Animal Care and Use Committee of Zhejiang University.For tail vein injection, single-cell suspensions of Fut8 overexpressing 4T1 cells (1 × 10 5 /100 μL PBS) were injected into the tail vein of 6-week-old nude mice.After 6 weeks, FITC-PhoSL (20 mg kg −1 ) was injected into the tail vein of mice.After incubation for 60 min, the mice were placed in a small animal live imaging device for fluorescence intensity detection.Then the mice were euthanized and the lungs were removed for imaging in a gel imaging system with an excitation wavelength of 488 nm.
Proximity Labeling Of Core Fucosylated Proteins and LC-MS Analysis: A density of 5 × 10 6 Fut8 overexpressing HepG2 cells per well was seeded into 6 cm 3 culture dish.After 12 h, cells were washed three times with PBS and incubated with 1 mm Biotin, 1.5 mm ATP, and 2 μm TurboID-PhoSL-W28-5MeOW in PBS buffer for 30 min at 37 °C.TurboID-PhoSL-W28A was used for negative control.Then, cells were washed three times with PBS and subsequently were lysed in RIPA buffer.The protein concentration was determined by BCA Protein Assay Kit.Approximately 1 mg of protein lysate was incubated with streptavidin beads (Pierce) with a rotation at 4 °C overnight.The beads were washed five times with PBS and reacted with 10 mm DTT in 50 mM NH 4 HCO 3 (pH 8.0) at 56 °C for 1 h, and subjected to cysteine alkylation with 20 mm IAA at 25 °C for 45 min in the dark.Trypsin was added to digest proteins on beads overnight at 37 °C and quenched by adding TFA to a final concentration of 0.1%.The digested peptides were subsequently desalted using reverse phase C18 tips.and.Next, the peptides were freeze-dried and then resuspended in 100 mm TEAB buffer (pH 8.5) for isotopic dimethyl labeling.Stable isotopic dimethyl labeling was conducted as previously described for quantitative comparison between the experimental groups of the peptides and the control groups. [61]Briefly, the experimental groups (200 μL each) were treated with 8 μL of 4% (vol/vol) CD 2 O (Sigma), while the control groups (200 μL each) were treated with 8 μL of 4% (vol/vol) CH 2 O (Sigma).The solutions of the experimental and control groups were both incubated with 8 μL 0.6 m NaBH 3 CN (Sigma) on a shaker at room temperature for 1 h, followed by the addition of 32 μL of 1% (vol/vol) ammonia solution.After adding 16 μL of formic acid, the corresponding medium (CD 2 O + NaBH 3 CN) and light (CH 2 O + NaBH 3 CN) isotopically labeled experimental and control samples were mixed and then subjected to StageTip C18 desalting before MS analysis.The FASP-digested and dimethyl-labeled peptides were vacuum-dried and redissolved in 0.1% FA.The samples were separated by a homemade 15 cm length reversed-phase column (150 μm id) packed with Ultimate XB-C18 1.9 μm resin (Welch materials).An Easy nLC 1200 system (Thermo) was used to fractionalize the peptides at a flow rate of 600 nL min −1 according to the following gradient: 7-12% B for 6 min, 12-30% B for 51 min, 30-45% B for 10 min, 45-95% B for 1 min, and 95% B for 7 min (solvent A was 0.1% formic acid, solvent B was 0.1% formic acid in 80% acetonitrile).The LC was coupled to an Orbitrap Fusion Tribrid mass spectrometer (Thermo) via a nanoelectrospray ionization source.Full-scan mass spectra were acquired in the Orbitrap (scan range 300-1400 m z −1 , 120 000 resolution, maximum injection time 100 ms and AGC target value of 5e5) in data-dependent acquisition mode, followed by Higher-energy Collision Dissociation (HCD) with 32% normalized collision energy.The ion trap was used to acquire MS2 detection with the top 20 MS/MS scans using higher-energy collision dissociation (HCD) at 32% normalized collision energy.The AGC target was set to 5e3, and the maximum injection time was 35 ms.The target ions selected for MS/MS were dynamically exclusion within 18 s.
Analysis of MS Data: The MS raw files were searched using MaxQuant (version 2.4.2.0) against the UniProt database (release on 2022, 20376 entry).The search parameter digestion enzyme was set as trypsin allowing a maximum of two missed tryptic cleavages, with the minimal peptide length as six amino acids.Carbamidomethyl cysteine was selected as a constant modification, while methionine oxidation and acetyl N-terminal were allowed as variable modifications.For peptide identification, the mass tolerances for precursor ions and fragment ions were set to 20 ppm and 0.5 Da, respectively.A threshold of ≤1% was allowed for both the peptide false discovery rate (FDR) and protein FDR.[64] For filtering, "Perseus" software and Student's t-test were used, and 1% FDR was applied.Proteins that were considered as significant were identified based on the following criteria: a minimum of two unique peptides in at least two parallel tests, a fold change of five or greater, and p < 0.01 in the experimental groups compared with the control groups.GO and pathway analyses were conducted using R language.The quantitative MS data have been deposited to the iProX (https://www.iprox.cn/page/PSV023.html;?url = 1690361810886oKRL) Password: sNnP Statistical Analysis: All experiments were performed at least three times.Error bars denote the mean ± SD.Statistical analyses were performed by unpaired two-tailed Student's t-test.Values of p < 0.05 were considered statistically significant.

Figure 1 .
Figure 1.W28 is the most critical residue for glycan recognition for PhoSL.a) The energy-minimized crystal structure of the glycan binding site of PhoSL in complex with core fucose glycan (PDB: 6FX1).It highlights the CH- interactions formed between the aromatic residues Y15, F23, and W28 of PhoSL and the CH groups of the glycan.b) Interactions between the core fucose glycan and PhoSL residues as revealed by MD simulations.The lines connecting PhoSL residues and the glycan represent the interactions between them, and their thickness is proportional to the occurrence frequency of the corresponding interaction in MD simulations.c) The CH- contacts between the aromatic residues Y15, F23, and W28 of PhoSL and the CH groups of the glycan, shown as a function of simulation time in two independent 5000 ns MD simulations.d) The distribution of the number of CH- contacts formed by each residue.e) The binding affinity of the WT, F23A, and W28A PhoSL toward the core fucose glycan substrate 1 as measured by microscale thermophoresis.Error bars denote the means ± SD in three independent assays.

Figure 2 .
Figure 2. Substitution of W28 with electron-rich tryptophan derivatives enhances glycan-protein recognition.a) Illustrations of tryptophane derivatives for genetic incorporation into PhoSL.b) Purification of PhoSL variants containing the tryptophane derivatives.c) The binding affinity of the WT and different variants of PhoSL toward the core fucose glycan substrate 1 as measured by microscale thermophoresis.Error bars denote the means ± SD in

Figure 3 .
Figure 3. W28-substituted PhoSL increases sensitivity for detecting and imaging core fucose glycans in vitro and in vivo.a) Western blotting and Coomassie analysis core fucosylated proteins of HepG2 cells by affinity pulldown with indicated PhoSL variants.b) Immunofluorescence analysis of core fucose glycans on HepG2 cell surface using indicated PhoSL variants.The relative values were determined by Image J (n = 3 independent assays).Scale bar: 50 μm.c) Fluorescence images of lung metastasis model mice after injection with FITC labeled PhoSL for 1 h.The relative fluorescence intensity was determined by Living Image (n = 3 independent assays).Scale bar: 0.5 cm.d) Representative ex vivo lung images of mice from each group using the fluorescence gel imaging system.Error bars in (a), (b), and (c) denote the mean ± SD.Statistical analyses were performed by unpaired two-tailed Student's t-tests.

Figure 4 .
Figure 4. Spatial profiling of core fucosylated proteins and interacting partners with W28-substituted PhoSL.a) Labelling and imaging of core-fucosylated proteins and the interacting partners on HepG2 cells after incubation with various concentrations of W28A or W28 mut PhoSL-TurboID.Scale bar: 50 μm.b) Coomassie brilliant blue staining of purified W28A or W28 mut PhoSL-TurboID.c) Schematic diagram of proximity labeling and proteomics of core fucosylated proteins.d) Volcano plot for the identification of core fucosylation interactomes.Proteins with a fold change >5 and p value <0.01 were considered as core fucosylated proteins and the interacting partners and are highlighted in red.e) Gene ontology analysis of identified glycoproteins categorized by biological processes (GOBP), cellular components (GOCC), and molecular Function (GOMF).f) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of identified glycoproteins.g) Protein-protein interaction (PPI) network analysis of identified glycoproteins using the STRING database.

Figure 5 .
Figure 5.A similar strategy applies to increasing GafD recognition of GlcNAc-containing glycans.a) Coomassie brilliant blue staining of purified W109A, WT, or W109 mut GafD.b) Thermophoretic analysis of the binding affinity of the indicated GafD variants with the FITC labeled GlcNAc glycan substrate (n = 3 independent assays).c) Western blotting and Coomassie analysis GlcNAc-containing glycoproteins of HepG2 cells by affinity pulldown with indicated GafD variants.d) Immunofluorescence analysis of GlcNAc-containing glycans on HepG2 cell surface using indicated GafD variants.The relative values were determined by Image J (n = 3 independent assays).Scale bar: 50 μm.Error bars in (b), (c), and (d) denote the mean ± SD.Statistical analyses were performed by unpaired two-tailed Student's t-tests.