Deciphering the Non-Equivalence of Serine and Threonine O-Glycosylation Points: Implications for Molecular Recognition of the Tn Antigen by an anti-MUC1 Antibody**

The structural features of MUC1-like glycopeptides bearing the Tn antigen (α-O-GalNAc-Ser/Thr) in complex with an anti MUC-1 antibody are reported at atomic resolution. For the α-O-GalNAc-Ser derivative, the glycosidic linkage adopts a high-energy conformation, barely populated in the free state. This unusual structure (also observed in an α-S-GalNAc-Cys mimic) is stabilized by hydrogen bonds between the peptidic fragment and the sugar. The selection of a particular peptide structure by the antibody is thus propagated to the carbohydrate through carbohydrate/peptide contacts, which force a change in the orientation of the sugar moiety. This seems to be unfeasible in the α-O-GalNAc-Thr glycopeptide owing to the more limited flexibility of the side chain imposed by the methyl group. Our data demonstrate the non-equivalence of Ser and Thr O-glycosylation points in molecular recognition processes. These features provide insight into the occurrence in nature of the APDTRP epitope for anti-MUC1 antibodies.


S-3
Bio-layer Interferometry (BLI). Binding assays were performed on an Octet Red Instrument (fortéBIO). Ligand immobilization, binding reactions, regeneration and washes were conducted in wells of black polypropylene 96-well microplates. Mucins m1, m1 * , m2 and m2 * (10 mg/mL) were immobilized on amine-reactive biosensors (AR2G biosensors) in 10 mM NaAc pH 5.5 buffer, using 1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide and N-hydroxysuccinimide for 10 min at 1000 rpm at 25 ºC. All biosensors were subsequently modified by a solution of ethanolamine hydrochloride (1M, pH 8.5), followed by regeneration and wash. Binding analysis were carried out at 25 ºC, 1000 rpm in 10 mM sodium phosphate buffer (pH 7.4) containing 150 mM NaCl, with a 120 s of association followed by a 180 s of dissociation. The surface was thoroughly washed with the running buffer without regeneration solution. Data was analyzed using Data Analysis (fortéBIO), with Savitzky-Golay filtering. Binding was fitted to a 2:1 Heterogeneous ligand model, steady state analysis were performed to obtain the binding kinetics contants (KD).
Enzyme-linked immunosorbent assay (ELISA). The ELISA plate (Maleic anhydride 96-well plates) was coated with 100 μL/well of a solution of MUC-1 derivatives (0-1000 nmol/well) in a phosphate buffer (0.2 M, pH 7.2) and incubated overnight at 25 o C. The unbound sites were then blocked by adding 200 μL/well of blocking buffer (Thermo Scientific SuperBlock Blocking Buffer).
3,3′,5,5′-Tetramethylbenzidine (TMB) was added (90 L/well) and after incubation for 10 min, the reaction was terminated with the addition of 50 mL/well of stop solution (1 M H2SO4). Absorbance detection of the wells was immediately performed at 450 nm using an ELISA plate reader. Average absorbance intensities of three replicates were plotted against mucins concentrations.

S-4
Purification and crystallization of the scFv-SM3. The DNA sequence encoding the SM3 scFv was synthetically made and codon optimized by GenScript for its expression in Pichia pastoris (A 51-bp DNA linker containing a sequence encoding a short flexible peptide formed by Ser and Gly residues was introduced in between the DNA encoding for the variable regions VH and VL of SM3).
The resulting plasmid (pUC57_SM3) was then used as a template for amplification of the SM3 scFv construct encoding amino acids residues of the variable regions VH and VL. This construct was amplified using the forward primer, 5'-CGGAATTCCTCGAGAAGAGAGAAGCAGAAGCACAGGTCCAACTGCAGGAATCAGGAGG-3', containing a XhoI site (shown in italic letters) and the reverse primer 5'-CGGAATTCCCGCGGGTCGACTTATCCCAAG-3', containing a recognition sequence for SacII (italic) and a stop codon (underlined). Subsequently the PCR product was digested with XhoI and SacII and cloned into pPICZαA (Invitrogen) resulting in the expression plasmid pPICZαASM3. The plasmid was isolated from the E. coli strain DH5, linearised with SacI and used to transform the Pichia pastoris strain X33 (Invitrogen) as previously described. Transformants were selected on YPDS plates (1% (w/v) yeast extract, 2% (w/v) peptone, 2% (w/v) dextrose, 1 M sorbitol) supplemented with 100 µg/mL zeocin (InvivoGen). Cells expressing SM3 scFv were grown 24 h at 30 ºC in BMGY medium (1% (w/v) yeast extract, 2% (w/v) peptone, 100 mM potassium phosphate pH 6.0, 1.34% (w/v) yeast nitrogen base and 1% (v/v) glycerol), then centrifuged at 4000 g for 10 minutes. Cells were resuspended in BMM medium (100 mM potassium phosphate pH 6.0, 1.34% (w/v) yeast nitrogen base and 1% (v/v) methanol) and incubated at 18 ºC. Supernatant containing SM3 scFv was collected after 72 h of methanol induction and concentrated to 20-50 mL using a Pellicon XL device (10,000 MWCO, PES membrane; Millipore) then dialyzed against 25 mM Tris HCl pH 8.5. SM3 ScFv sample was loaded into a HiTrap QFF (GE Healthcare) that had been previously equilibrated with 10 column volumes of 25 mM Tris HCl pH 8.5. The protein was eluted in the presence of a NaCl gradient (from 0 to 1 M) in the above buffer. The fractions containing the protein were then pooled and concentrated to 2.5 mL using centrifugal filter units of 10,000 MWCO (Millipore). Subsequently, gel filtration chromatography was carried out using Superdex 75 XK26/60 column (GE Healthcare, Piscataway, NJ, USA) in 25 mM Tris pH 8.5, 150 mM NaCl.
SM3 ScFv was dialysed in 25 mM Tris pH 8.5 and measured by absorbance at 280 nm using an extinction coefficient of 53650 M -1 ·cm -1 .

S-5
Crystallization. Crystals were grown by sitting drop diffusion at 18 ºC. The drops were prepared by mixing 0.5 μl of protein solution containing 15 mg/mL scFv-SM3 and 10 mM of the different peptides with 0.5 μl of the mother liquor. Crystals of scFv-SM3 with glycopeptides 1 * and 2 * were grown in 20% PEG 5000 monomethyl ether, 0.2 M potasium citrate, 0.1 M MES pH 6.3, and 20% PEG 3350, 0.2 M disodium hydrogen phosphate, respectively. Crystals of scFv-SM3 with the peptide 1 were grown in 20% PEG 3350, 0.2 M diammonium hydrogen citrate pH 5. Finally, crystals with glycopeptide 3 * were grown in 20% PEG 20000, 0.1M bicine pH 9 and 2% 1,4dioxane. The crystals used in this study were cryoprotected in mother liquor solutions containing 20% ethylenglycol and frozen in a nitrogen gas stream cooled to 100 K.
Structure determination and refinement. Diffraction data of the binary complex was collected in Diamond (Oxford) at beamlines I04 (experiment number MX8035-26) and I02 (experiment number MX10121-2), respectively. All data were processed and scaled using the XDS package [S3] and CCP4 [S4] software, relevant statistics are given in Supplementary Table S1. The crystal structures were solved by molecular replacement with Phaser [S4] and using the PDB entry 1SM3 as the template. Initial phases were further improved by cycles of manual model building in Coot [S5] and refinement with REFMAC5. [S6] The final models were validated with PROCHECK, [S7] model statistics are given in Supplementary Table S1. For clarification purposes, the numbering of the amino acids is the same as described for the SM3 Fab. Coordinates and structure factors have been deposited in the Worldwide Protein Data Bank (wwPDB, and see Table S1 for the pdb codes).

Unrestrained Molecular Dynamics (MD) simulations on the scFv-SM3 complexes. The starting
geometries for svFc-SM3:1 * and svFc-SM3:2 * complexes were generated from the X-ray crystal structures resolved in this work and modified accordingly. Each complex was immersed in a truncated octahedral box with a 10 Å buffer of TIP3P [S8] water molecules. All subsequent simulations were performed using AMBER 12 package [S9] and the ff14SB force field, which is an evolution of the Stony Brook modification of the Amber 99 force field force field (ff99SB). [S10] This force field was implemented with GLYCAM 06 parameters [S11] to accurately simulate the corresponding glycopeptides. A two-stage geometry optimization approach was performed. The first stage minimizes only the positions of solvent molecules and ions, and the second stage is an unrestrained minimization of all the atoms in the simulation cell. The systems were then heated by incrementing the temperature from 0 to 300 K under a constant pressure of 1 atm and periodic boundary conditions. Harmonic restraints of 30 kcal·mol -1 were applied to the solute, and the S-6 Andersen temperature coupling scheme [S12] was used to control and equalize the temperature. The time step was kept at 1 fs during the heating stages. Water molecules are treated with the SHAKE algorithm such that the angle between the hydrogen atoms is kept fixed. Long-range electrostatic effects are modeled using the particle-mesh-Ewald method. [S13] An 8 Å cutoff was applied to Lennard-Jones and electrostatic interactions. Each system was equilibrated for 2 ns with a 2 fs timestep at a constant volume and temperature of 300 K. Production trajectories were then run for additional 25 ns under the same simulation conditions.

MD simulations with time-averaged restraints (MD-tar) in explicit water. MD-tar simulations
were performed with AMBER 12 (ff14SB force field), [S10] which was implemented with GLYCAM 06 parameters. [S11] Distances derived from NOE cross-peaks were included as time-averaged distance restraints. < −6 > −1/6 average was used for the distances. Final trajectories were run using an exponential decay constant of 2000 ps and a simulation length of 20 ns in explicit TIP3P water molecules.

S-7
Compound 1 Following SPPS methodology with the adequately protected amino acids, compound 1 was obtained and purified by semi-preparative HPLC. 1 171.4,171.9,173.6,173.9,174.6,176.8 (8 CO Figure S2. Bio-Layer Interferometry (BLI) curves (in blue) and fitting curves (in red) obtained for m1, m2, m1 * and m2 * with scFv-SM3, together with the KD constants derived from BLI experiments. Table S1. Data collection and refinement statistics. Values in parentheses refer to the highest resolution shell. Ramachandran plot statistics were determined with PROCHECK.

S-23
For clarification purposes, the numbering and labels of the residues in our pdbs are slightly different to the reported one for the PDB entry 1SM3. [S16] In our case, both chains L and H have been merged into one chain (H). Residues from 1000 onwards belong to the chain L in the PDB entry 1SM3.The numbering of compounds 1,1*, 2* and 3* in chain P is also different to 1SM3 entry.   S-24    Figure S4. Electron density maps are FO-FC syntheses (blue) contoured at 2.0  for peptide 1 (a) and glycopeptides 1 * (b) and 2 * (c) and 3 * (d). The amino acid residues and the GalNAc moiety are coloured in grey and green, respectively.

Tables S6.
Comparison of the experimental and MD-tar derived distances for glycopeptides 1 * and 2 * . The experimental distances were semi-quantitatively determined by integrating the volume of the corresponding cross-peaks. All the distances are given in Å. Figure S8. Ensembles obtained from 20 ns MD-tar simulations in explicit water performed on compound 1 * , together with the  1 distribution for Thr4 and / distributions for the peptide backbone and the glycosidic linkages. The peptide backbone is shown as a green ribbon and the sugar moiety is shown in pink. The conformation found for glycopeptide 1 * in the crystal structure in shown in grey. The glycosidic linkage displays mainly the typical eclipsed conformation. [S17] The / values displayed for the backbone of glycopeptide 1 * in the complex with SM3 are shown as red dots.
glycopeptide 1 * S-30 Figure S9. Ensembles obtained from 20 ns MD-tar simulations in explicit water performed on compound 2 * , together with the  1 distribution for Ser4 and / distributions for the peptide backbone and the glycosidic linkages. The peptide backbone is shown as a green ribbon and the sugar moiety is shown in pink. The conformation found in the crystal structure in shown in grey. The glycosidic linkage displays mainly the typical alternate conformation. [S18] Interestingly, a low population (21%) of the conformation observed in the X-ray structure of SM3:2 * is also present through the MD simulations. The / values displayed for the backbone of glycopeptide 2 * in the complex with SM3 are shown as red dots.