Probing Site‐Selective Conjugation Chemistries for the Construction of Homogeneous Synthetic Glycodendriproteins

Abstract Methods that site‐selectively attach multivalent carbohydrate moieties to proteins can be used to generate homogeneous glycodendriproteins as synthetic functional mimics of glycoproteins. Here, we study aspects of the scope and limitations of some common bioconjugation techniques that can give access to well‐defined glycodendriproteins. A diverse reactive platform was designed via use of thiol‐Michael‐type additions, thiol‐ene reactions, and Cu(I)‐mediated azide‐alkyne cycloadditions from recombinant proteins containing the non‐canonical amino acids dehydroalanine, homoallylglycine, homopropargylglycine, and azidohomoalanine.


Introduction
The use of synthetic glycosylated macromolecules or glycoconjugates, such as glycodendrimers and glyconanoparticles, can functionally mimic aspects of the sugar display of glycoproteins and glycolipids that decorate the outer surface of mammalian cells. For example, the blocking by decoys of carbohydrateprotein(lectin) interactions can represent an attractive antiinfective strategy for targeting pathogens. [1] Among glycoconjugates, synthetic glycoproteins [2] and particularly glycodendriproteins, resulting from the attachment of multivalent, antennarylike carbohydrate epitopes to a precise site of a protein scaffold (Figure 1a), have emerged as a class of mimics of naturally occurring N-linked glycoproteins with additional implications in vaccine design, [3] the development of bacterial/viral aggregation inhibitors, [4][5][6] ligands for the mannose-6-phosphate (M6PR) [7] and asialoglycoprotein (ASGPR) [8] receptors, and as glycomimetics of insulin, [9] human growth hormone, and the Fc region of human IgG. [10] Whilst multivalent glycoprotein inhibitors of pathogen adhesion can be prepared by the non-selective attachment of dendrimeric glycans to proteins via standard amide, squaramide, or imine/amine formation, [11] recent advances in selective chemical protein modification [12] have allowed the precise attachment of multivalent carbohydrate moieties to predetermined sites of proteins using biologically-compatible reactions to generate single, well-defined glycodendriprotein glycoforms (Figure 1b,c). However, despite this progress, such methods remain scarce and have employed diverse linkages, including those derived enzymatically (amides), [9] those that are potentially cleavable/reversible (disulfides [4,5] and oximes [10] ), and those that are stable under physiological conditions (e. g., 1,4triazole linkages derived from Cu(I)-mediated azide-alkyne cycloadditions (CuAAC) with the alkyne tag homopropargylglycine (Hpg) [6] or N6- [(2-propynyloxy)carbonyl]-L-lysine (Lys(PA)) [7] ).
The aim of this proof-of-principle study is to comparatively evaluate the efficiency of some common, site-selective protein chemistries that could yield well-defined glycodendriproteins and so therefore may prove attractive in the design of putative synthetic protein therapeutics (synthetic biologics). By using comparable, representative tri-antennary, tri-galactosyl (β-D-Gal) 3 carbohydrate dendron motifs, each equipped with corresponding reactive handles, we explored the generation of a series of recombinant glycodendriproteins using an approach of 'tag-and-modify' with several tag types. [13] These allowed the testing of thiol-Michael-type additions to dehydroalanine (Dha)tagged proteins, [14] thiol-ene [15] radical additions at homoallylglycine (Hag) sites to generate thio-ether linkages, [27] and Cu(I)mediated azide-alkyne cycloadditions to access 1,4-triazole linkages in two orientations, via azidohomoalanine (Aha) and homopropargylglycine (Hpg) tags ( Figure 1d). [6,33]

Synthesis of glycodendron reagents
Model tri-β-D-galactosyl-containing (β-D-Gal) 3 glycodendrons 1-3 derived from a 3,4,5-tris(2-aminoethoxy)benzoic acid core and equipped with appropriate reactive handles (thiol, propargyl, and azide) were designed and synthesized as simple mimics of the asymmetric carbohydrate display observed in triantennary N-linked glycoproteins (Scheme 1). [16] These structures possess significant rigidity and useful distances between their β-D-Gal tip sugars, features that can prove favourable for multivalent ligand display. [17,18] Indeed, it has been noted that glycodendrimers can mimic the non-reducing termini as well as some secondary interactions using only imperfect structural analogues [19] of the branched carbohydrates found in glycoproteins, without the necessity of presenting the whole, synthetically-challenging, natural complex oligosaccharide. [20] In addition, we selected thioglycosides as the glycan motifs in these glycodendrons as these typically confer greater stability under both basic and acidic aqueous conditions, as well as resistance to enzymatic hydrolysis. [21] Moreover, such thioglycoside mimetics, together with other chalcogen derivatives such as Scheme 1. Synthesis of tri-antennary glycodendron reagents 1-3. Reagents and conditions: (a) dry K 2 CO 3 , 10  selenoglycosides, can maintain the intrinsic binding properties of the glycan towards the corresponding protein receptor (lectin). [22] We first prepared extended scaffold 4 as a key intermediate. After some preliminary attempts and reaction optimization (Supporting Information (SI), Schemes S1 and S2), 4 was obtained in 75 % yield from 5 and 6 as previously described by Brouwer et al. [17] Boc deprotection using 4 M HCl in dioxane and subsequent treatment of the resulting trihydrochloride salt with chloroacetyl chloride and NaHCO 3 using a biphasic 2 : 1 Et 2 O/ H 2 O solvent system afforded derivative 7 (97 %). Boc removal using standard trifluoroacetic acid (TFA) and the use of chloroacetic anhydride as N-acylating reagent led to lower overall yields (Supporting Information, Scheme S3). Next, the incorporation of the non-reducing β-D-galactose moiety (β-D-Gal) was first attempted using O-acetyl protected thioglycoside sodium salt 8 [34] in dry DMF. However, 10 was obtained in only 33 % yield, and despite quantitative subsequent Zemplén deacetylation and methyl ester hydrolysis to 11 (99 %), the reduced overall yield hampered the utilization of this route using protecting groups. Thus, an alternative protecting-group-free route was explored (Supporting Information, Table S1). Treatment of gallic acid core 7 with β-thiogalactoside sodium salt 9 [35] followed by methyl ester hydrolysis using aqueous NaOH in EtOH allowed the ready preparation of common glycodendron reagent precursor 11 in a superior 74 % yield over two steps. Next, using this divergent intermediate, appropriate reactive handles (thiol, propargyl, and azide) were introduced through amide-coupling protocols (Scheme 1). A reactive thiol was incorporated by treating 11 and cystamine with HATU and DIPEA in dry DMF at 50°C to afford a disulfide intermediate, which was subsequently reduced in situ to 1 (69 %) with PBu 3 in water at room temperature for 2 h. Similarly, an alkynyl group was incorporated to obtain reagent 2 (75 %) after stirring a mixture of 11, propargylamine hydrochloride, HATU, and DIPEA in dry DMF at 45°C for 27 h. Finally, reactive azide was incorporated following a 3-step procedure. Similarly to propargyl 2, amide coupling with N-Boc-ethylenediamine afforded 12 in 67 % yield. Subsequent Boc deprotection with 1 : 2 Me 2 S/ TFA followed by diazo transfer to the resulting primary amine led to azide 3 in 95 % yield over two steps.

Glycodendriprotein construction
The next step of the proposed strategy involved the use of these synthesized tri-antennary glycodendron reagents 1-3 to chemical modify a series of protein substrates bearing appropriate reactive tags; Dha 13 [23][24][25][26] (here chemically generated from Cys), Hag 14, [23,27] Aha 15, [28] 16, [25,28] and Hpg 17 [29] (the latter three were all generated through sense-codon reassignment of Met exploiting Met-auxotroph-mediated expression). In order to rapidly scope the generality of our tested methods to access well-defined glycodendriproteins, we used a multivariate selection of prototypical protein scaffolds featuring different residue sites/microenvironments, protein folds, as well as possessing different functional measures (catalytic activity or structural/self-assembling properties) of outcome. We first explored thiol-conjugate-addition chemistry, an approach absent from previous protocols for glycodendriprotein generation, using a single Dha protein mutant of the serine protease subtilisin from Bacillus lentus (SBL), a representative three-layer α/β-Rossman-fold protein with catalytic activity, quantitatively obtained from the corresponding Cys precursor using standard bisalkylation-elimination protocols. [30] The identity, purity, and stability of the resulting glycodendriproteins was established by liquid chromatography electrospray ionization mass spectrometry (LC-ESI-MS) and SDS polyacrylamide gel electrophoresis (SDS-PAGE) (Supporting Information, Figures S1--S16). Thus, after generation from SBL-Cys156, incubation of SBL-Dha156 (13) with 1 in 50 mM sodium phosphate buffer (NaP i ) at pH 8.0 afforded pure, synthetic glycodendriprotein 18 in > 95 % conversion after 1.5 h at room temperature as determined by LC-ESI-MS (Scheme 2, left panel). Although CuAAC reactions are somewhat more established in the limited examples of glycodendriprotein generation, the use of this alternative method provides potential expansion of scope and also allows potential access to alternative reaction scoping from the same glycodendron (here thiol 1) type.
Indeed, next, thiol-ene radical addition/ligation was explored as alternative/complementary thiol chemistry to this previous thiol-Michael addition. Here, we explored the use of homomultimer, virus-like bacteriophage particle Qβ 14, which self-assembles from 180 monomers, equipped with a Hag tag (Scheme 2, right panel). [23,27] This icosahedral protein platform provides greatly differing dimensions (core diameter~28 nm), and so potentially reactivities. It has also allowed the prior construction of multivalent systems with enhanced function (e. g., viral mimicry [6] ). Whilst direct comparison in this different protein scaffold would be inappropriate, use of the same thiol glycodendron 1 in thiol-ene reaction Qβ-Hag16 (14) was sluggish, and final glycodendriprotein nanoparticle 19 was obtained in only 25 % conversion at pH 6.0 and 53 % conversion at pH 4.0 (500 equiv. of glycodendron reagent 1, after 8 and 28 h at room temperature, respectively); [23] increased equivalents of 1 resulted in only similar conversion levels. Reactions carried out at pH 6.0 provided more homogeneous product (as judged, for example, by MS spectrum signal-to-noise (S/N)) than those at pH 4.0, albeit with lower conversions. Moreover, prolonged reaction times did not substantially improve the conversion. Next, to broaden the scope of reactions tested and protein scaffolds/sites/residue microenvironments, we explored well-established Cu(I)-mediated azide-alkyne cycloadditions (CuAAC) using a variety of protein scaffolds with alkyne and azide tags; SsβG-Aha43 (15), SsβG-Hpg1-Hpg43/-Hpg43 (17 a/b) (as an example of a generic β-galactosidase, in an αβ-fold TIM barrel, that has been previously used to create synthetic glycoprotein probes for both in vitro and in vivo applications) [33] and Np276-Aha61 (16) (in an all-β-helix, β-fold pentapeptide repeat protein scaffold from Nostoc punctiforme, fusion protein 275/276 also known as Npβ) (Schemes 3 and 4). In this way, this allowed not only site-exploration within varied scaffolds with differing secondary structural features (β dominant vs. αβ mixed) but again also protein function variation (e. g., catalytic activity). Triazole-linked products 20, 21 were generated efficiently in conversions of > 95 % upon incubation with propargyl 2 glycodendron reagent in 50 mM NaP i at pH 8.2 for 1 h at room temperature as determined by LC-ESI-MS (Scheme 3). By contrast, under the same conditions, reaction of azide 3 with SsβG-Hpg1-Hpg43 (17 a) was more sluggish. Notably, when azide 3 was used to modify a mixture of SsβG-Hpg1-Hpg43/-Hpg43 (17 a/b), crude product 22 was generated consistent with regioselective monomodification only at a more accessible position 1 (i. e., reaction only of SsβG-Hpg1-Hpg43 and not of SsβG-Hpg43). Such regioselectivity in the use of CuAAC on proteins is consistent with previous observations (Scheme 4). [29] This apparent dependency of reaction conversion upon protein site location (i. e., regioselectivity) mirrors previous observations [29,31,33] where correlation is observed with a combination of the intrinsic reactivity of the tag-reactant pair as well as the protein residue accessibility. The possible additional roles of residue microenvironment (e. g., charges, polar/hydrophobic interactions, etc.) may well also play a role but have typically proven less important in our hands. As a consequence, the presence at the same site of different reactive tags (Hpg43 vs. Aha43) or the same reactive tag at different site (Hpg1 vs. Hpg43) may react differently. This was previously rationalized according to a heuristic model (termed 'reactive accessibility', RA) to evaluate and predict site reactivity. [33] This model correlates the reactivity observed with predicted measures of protein residue accessibility [32] and is able to, in turn, guide the control of reaction conditions to achieve regioselective modification at a variety of protein sites. [31,33] The apparent contrast in the reactivity between Aha and Hpg at the same site 43 in protein SsβG reinforces the importance of evaluating CuAAC reaction 'orientations' when deciding the choice of reactive handle location in two partners to be conjugated and the preferential use in our hands of Aha as a tag for less accessible sites and in proteins. Our observations (here and previously) consistently suggest lower protein reactivity in Hpg-tagged proteins compared to their Aha-tagged counterparts. [6,29,33]

Conclusion
In summary, a brief survey of the site-selective attachment of simple, multivalent (β-D-Gal) 3 dendrons to generate glycodendriproteins as N-linked glycoprotein mimics suggests that although well-defined, highly-valent structures can be generated through other methods, [6] those based on the use of Dha tags (for CÀ S-bond formation) and Aha tags (for triazole formation) may be the most applicable to the ready formation of well-defined constructs. A qualitative comparison of the use of glycodendrons to modify proteins both here and previously [6] suggests an apparent order of utility in these systems as follows, tags-via-linkage: With regard to application, it has been previously shown that the conjugation of glycodendron reagents to proteases, such as SBL used here, enables targeted protein degradation; this has been applied to, for example, bind and degrade bacterial adhesins. [5] As such, not only might glycodendriprotein glycoconjugates allow development of antiinfective therapeutics (by both direct blocking [6] and degradation [5] ) but also other potentially broader clinical applications that may exploit the selective degradation of other sugar-binding proteins. Work to this goal is currently under investigation in our laboratories.

Experimental Section
General remarks: Proton ( 1 H NMR) and carbon ( 13 C NMR) nuclear magnetic resonance spectra were recorded on a Varian Mercury spectrometer (400 MHz for 1 H) and (100.6 MHz for 13 C) or a Bruker AVII500 spectrometer (500 MHz for 1 H) and (125.8 MHz for 13 C). NMR spectra were assigned using COSY, DEPT 135, HSQC, HMBC, and NOESY and are subjective. All chemical shifts are quoted on the δ scale in ppm using the residual solvent as the internal standard ( 1 H NMR: CDCl 3 = 7.26, DMSO-d 6  rent. Melting points (mp) were recorded on a Leica Galen III hot stage microscope equipped with a Testo 720 thermocouple probe and are uncorrected. Infrared (IR) spectra were recorded on a Bruker Tensor 27 Fourier Transform (FT) spectrophotometer using thin films on NaCl plates for liquids and oils and KBr discs for solids and crystals. Absorption maxima (ν max ) are reported in wavenumbers (cm À 1 ). Elemental analyses (C, H, N, and S) were performed with a Carlo Erba EA 1108 Analyser. Optical rotations were measured on a Perkin-Elmer 241 polarimeter with a path length of 1.0 dm and are reported with implied units of 10 À 1 deg cm 2 g À 1 . Concentrations (c) are given in g/100 mL. Low resolution mass spectra (LRMS) were recorded on a Waters Micromass LCT Premier TOF spectrometer using electrospray ionization (ESI) and high resolution mass spectra (HRMS) were recorded on a Bruker MicroTOF ESI mass spectrometer. Nominal and exact m/z values are reported in Daltons (Da). Thin layer chromatography (TLC) was carried out using Merck aluminium backed sheets coated with 60F 254 silica gel. Visualization of the silica plates was achieved using a UV lamp (λ max = 254 nm) and/or ammonium molybdate (5 % in 2 M H 2 SO 4 ) and/or potassium permanganate (5 % KMnO 4 in 1 M NaOH with 5 % K 2 CO 3 ). Flash column chromatography was carried out using BDH 40-63 μm silica gel (VWR). Mobile phases are reported in relative composition (e. g., 1 : 2 : 4 H 2 O/iPrOH/EtOAc v/v/v). Anhydrous solvents were purchased from Fluka or Acros. All other solvents were used as supplied (Analytical or HPLC grade), without prior purification. Distilled water was used for chemical reactions and Milli-QR purified water for protein manipulations. All reactions using anhydrous conditions were performed using flame-dried apparatus under an atmosphere of argon or nitrogen. 'Petrol' refers to the fraction of light petroleum ether boiling in the range 40-60°C. Brine refers to a saturated solution of sodium chloride. Anhydrous magnesium sulfate (MgSO 4 ) was used as drying agent after reaction work-up, as indicated.

Protein liquid chromatography-mass spectrometry analysis:
Liquid chromatography-mass spectrometry (LC-MS) was performed on a Micromass LCT (ESITOF-MS) coupled to a Waters Alliance 2790 HPLC using a Phenomenex Jupiter C4 column (250 × 4.6 mm × 5 μm). Water (solvent A) and acetonitrile (solvent B), each containing 0.1 % formic acid, were used as the mobile phase at a flow rate of 1.0 mL min À 1 . The gradient was programmed as follows: 95 % A (5 min isocratic) to 100 % B after 15 min then isocratic for 5 min. The electrospray source was operated with a capillary voltage of 3.2 kV and a cone voltage of 25 V (35 V for βgalactosidase (SsβG)). Nitrogen was used as the nebulizer and desolvation gas at a total flow of 600 L h À 1 . Spectra were calibrated using a calibration curve constructed from a minimum of 17 matched peaks from the multiply charged ion series of equine myoglobin obtained at a cone voltage of 25 V. A typical analysis of a conjugation reaction by LC-ESI-MS is described as follows. Briefly, integration of the region containing all protein (both starting material and products) in the total ion chromatogram afforded the combined ion series. Deconvoluted total mass spectrum was reconstructed from the ion series using the MaxEnt algorithm preinstalled on MassLynx software (v. 4.0 from Waters) according to the manufacturer's instructions. Identical analyses were carried out for all the conjugation reactions performed in this work.

Methyl 3,4,5-tris[2-(2-chloroacetamido)ethoxy]benzoate (7): 4
(1.81 g, 2.946 mmol) was dissolved in dry CH 2 Cl 2 (5.5 mL) and 4 M HCl in dioxane (18 mL) was added at room temperature under an atmosphere of argon. The reaction mixture was stirred at the same temperature for 2 h. After complete conversion, the solvent was evaporated and dried under high vacuum to afford methyl 3,4,5tris(2-aminoethoxy)benzoate trihydrochloride as a white solid (1.23 g, 99 %). Used in the next step without further purification. A mixture of this intermediate and NaHCO 3 (1.48 g, 17.676 mmol) in 2 : 1 Et 2 O/H 2 O (8.8 mL) was cooled to 0°C (ice/water) and chloroacetyl chloride (771 μL, 9.721 mmol) was slowly added over a period of 1 h. After complete addition, the mixture was allowed to warm to room temperature. After 7 days stirring at the same temperature, the reaction mixture was filtered. The precipitate was washed with water, 2 N aqueous HCl, water, and finally with Et 2 O. The crude product was recrystallized from 1 : 1 ethanol/water and dried under reduced pressure to afford 7 as a white solid (1.53 g, 97 % over two steps).

Chemical protein modification
Gal 3 -G-S-156SBL (18): Gal 3 -G-SH 1 (1.0 mg, 0.937 μmol) was added to a solution of SBL-Dha156 (13) (100 μL of 0.25 mg/mL, 0.937 nmol) in 50 mM sodium phosphate buffer (pH 8.0) and the resulting mixture vortexed for 30 seconds at room temperature. After 1.5 h of additional shaking, a 30 μL aliquot was analysed directly by LC-MS and complete conversion to Gal 3 -G-S-156SBL 18 (calcd. 27748; found, 27749) was observed. Finally, the sample was flash frozen with liquid nitrogen and stored at À 20°C. Note: The modified glycodendriprotein retained inherent peptidase activity, as indicated by liberation of p-nitroaniline upon treatment with the chromogenic peptide sucAAPFpNA. [39] Stability of Gal 3 -G-S-156SBL (18) in human plasma: A 10 μL aliquot of Gal 3 -G-S-156SBL (18) (ca. 0.25 mg/mL) in 50 mM sodium phosphate buffer (pH 8.0) was transferred to a 0.5 mL Eppendorf tube. 0.5 μL of reconstituted human plasma (Sigma-Aldrich) was added at room temperature and the resulting mixture vortexed for 30 seconds. After 24 h of additional shaking at 37°C, the reaction was analysed directly by LC-MS and starting protein 18 (calculated mass, 27748; observed mass, 27752) was detected unaltered.

Gal 3 -G-S-16Qβ (19):
Gal 3 -G-SH 1 (4.5 mg, 4.218 μmol) and Vazo44 (0.28 mg, 0.844 μmol) were added to a solution of Qβ-Hag16 (14) (100 μL of 1.19 mg/mL, 8.437 nmol) in 250 mM ammonium acetate buffer (pH 4.0 or 6.0). The reaction mixture was placed in a cuvette and irradiated with a medium pressure 125 W Hg-lamp with borosilicate filter at room temperature for up to 28 h. Small molecules were removed from the reaction mixture aliquot by loading the sample onto a PD10 desalting column (GE Healthcare) previously equilibrated with 10 column volumes 50 mM sodium phosphate buffer (pH 8.0) and eluting with 1 mL of the same buffer. The collected sample was concentrated to 50 μL on a Vivaspin™ membrane concentrator (10 kDa molecular weight cut off). A viruslike particle aliquot (20 μL) was mixed with 1 M DTT (Dithiothreitol) in H 2 O (10 μL) and incubated at 60°C for 5 min to allow the protein to denature to monomer prior to analysis by LC-MS (m/z for monomer of Gal 3 -G-S-16Qβ 19: calcd. 15173; found, 15173). Finally, the sample was flash frozen with liquid nitrogen and stored at À 20°C.