Expanding the Occurrence of Polysaccharides to the Viral World: The Case of Mimivirus

Abstract The general perception of viruses is that they are small in terms of size and genome, and that they hijack the host machinery to glycosylate their capsid. Giant viruses subvert all these concepts: their particles are not small, and their genome is more complex than that of some bacteria. Regarding glycosylation, this concept has been already challenged by the finding that Chloroviruses have an autonomous glycosylation machinery that produces oligosaccharides similar in size to those of small viruses (6–12 units), albeit different in structure compared to the viral counterparts. We report herein that Mimivirus possesses a glycocalyx made of two different polysaccharides, now challenging the concept that all viruses coat their capsids with oligosaccharides of discrete size. This discovery contradicts the paradigm that such macromolecules are absent in viruses, blurring the boundaries between giant viruses and the cellular world and opening new avenues in the field of viral glycobiology.

1.2 TEM analysis of the viral particle and fibrils obtained from DTT treatment.  Table S1. NMR chemical shifts of poly_1 and poly_2. 12 Table S2. Conservation level of the 29 proteins most enriched in the fibrils in the Acanthamoeba infecting Mimiviridae family. 13 Table S3. Genome accession numbers used to derive the data in Supporting Table 2. 14

Mimivirus fibrils glycans extraction.
Acanthamoeba polyphaga mimivirus, here named Mimivirus, and its M4 isolate were purified from infected Acanthamoeba castellanii culture as previously described. [1] Fibrils isolation was performed on Mimivirus by developing the conditions reported by Xiao et al. [2] with minor variations. Briefly, about 1.0 × 10 11 viral particles were treated with 50 mM DTT (20 ml), 100°C for 2 h under stirring (100 rpm). The supernatant containing the fibrils was separated from the viral particles by centrifugation (10 min., 4650 g, room temperature), and an aliquot of both samples was used for TEM analysis. Then, the solid was washed twice with water (1 ml), the supernatants were pooled, dialyzed against water (membrane cut-off 3500 Da) to remove the DTT and frozen dried (9.3 mg).

TEM analysis of the viral particle and fibrils obtained after DTT treatment.
The untreated Mimivirus, the fibrils and the defibrillated virus obtained after DTT treatment, were compared by TEM to the M4 isolate. All the samples were fixed in glutaraldehyde 2.5% in water for 1h at room temperature. After fixing, the samples were centrifuged at 5000 g for 10 min and the pellets were washed twice with water. The structure of the fibrils was visualized by negative staining using methyl cellulose (M6385 Sigma) and uranyl acetate 2% in water.
In detail, 2 µl of each sample was adsorbed for 3 min on a Formarv® (CAS#63450-15-7) coated grid. Then the grid was treated with 2% uranyl acetate in water for 1 min. The excess of uranyl acetate was removed by quickly washing the grid with a drop of a methyl cellulose solution (200 µl 2% uranyl acetate plus 1800 µl of 2% methyl cellulose in water). Next, the grid was stained with this same solution for 1 min and 30 seconds, and the excess of methyl cellulose was removed by drying the grid on a paper filter. All the samples were observed on TEM TECNAI G2°200 KV.

Sugar composition of Mimivirus fibrils and of M4 strain.
Monosaccharide composition analysis as acetylated methyl glycoside was performed on the fibrils (0.5 mg) of Mimivirus or directly on the intact M4 viral particle, according to the protocol reported from De Castro et al. [3] The identification of Rha, GlcNAc and other minor components (Fig. S1a) made use of the available commercial standards, after proper derivatization. Regarding VioNAc, and 2OMeVioNAc, the identification was inferred by applying the fragmentation rules described for these derivatives. [4] In particular, for the 2-O-methylated derivatives of VioNAc, the EI-MS spectrum ( Figure S1b) contained a fragment at m/z 244 consistent with the oxonium ion of a 6-deoxy-aminosugar with two acetyls and one methyl group as substituents. The fragment at m/z 88 indicated that the methoxyl group was located next to the anomeric methoxyl group, in agreement with our previous NMR data. [5] These composition results are in agreement with the functional pathway for the UDP-L-Rha, [6] UDP-D-GlcNAc [7] and UDP-D-Vio4NAc. [5,8] The absolute configuration of the major sugar constituents was assumed to be L for Rha, [6] and D for GlcN [9] and Vio4NAc [5,8] based on the biosynthetic data available for Mimivirus. Linkage analysis of Mimivirus fibrils as partially methylated and acetylated alditol, was performed as reported. [3] GC-MS analyses were performed with an Agilent instrument (GC instrument Agilent 6850 coupled to MS Agilent 5973), equipped with a SPB-5 capillary column (Supelco, 30 m × 0.25 i.d., flow rate, 0.8 mL min−1) and He as the carrier gas. Electron impact mass spectra were recorded with an ionization energy of 70 eV and an ionizing current of 0.2 mA. The temperature program used to analyze all the derivatives was: 150°C for 5 min, 150 → 280°C at 3°C/min, 300°C for 5 min.

NMR data acquisition of fibrils
1D and 2D NMR spectra of the fibrils prior any purification were measured on a Bruker 600 DRX equipped with a CryoProbe™ at 329 K to improve the quality of the spectra. Acetone was used as internal standard ( 1 H 2.225 ppm, 13 C 31.45 ppm) and 2D spectra (COSY, TOCSY, NOESY, HSQC and HMBC) were acquired by using Bruker software (TopSpin 2.0). Homonuclear 1 H-1 H 2D experiments were recorded using 512 FIDs of 2048 complex with 24 scans per FID, mixing times of 100 and 200 ms were used for TOCSY and NOESY spectra, respectively. 1 H-13 C HSQC and HMBC spectra were acquired with 512 FIDs of 2048 complex point, accumulating 50 and 60 scans, respectively. As for the samples from ion exchange chromatography, all spectra were measured at 310 K, reduction of the solvent signal intensity was achieved by presaturation or by measuring a mono-dimensional DOSY spectrum. In this last case, spectra were recorded by setting  and  to 2.4 s and 100 ms, respectively, and the variable gradient to 45% of its maximum power. Spectra were processed and analysed using a Bruker TopSpin 3 program. 2D NMR spectra on pure poly_1 and poly_2 obtained after extensive proteinase K digestion and ion exchange purification, were measured at 310 K, as indicated for the untreated fibrils.

Separation of Mimivirus polysaccharides by anion exchange chromatography.
Fibrils either intact or after protease treatment(s) (5-10 mg) were purified by anion exchange chromatography using Q-Sepharose fast flow as adsorbent (0.5 ml of gel for 5-10 mg of fibrils) and fractions were collected by increasing stepwise NaCl concentration (10, 100, 200, 400, 700 and 1000 mM). Each solution was applied for 5 column volumes and eluates with the same ionic strength were collected together and desalted on a Biogel P10 column (20 ml, 12 ml/h) by using mq H2O as eluent and monitoring the eluate with an on-line refractive index detector (K-2310 Knauer). The polysaccharide material was eluted in the void volume of the column. 1 H NMR was used to evaluate the ratio between the two polysaccharides which occurred by integration of the signals of interest: the methyl of the pyruvic acid at 1.47 ppm was selected as reporter group of poly_1, and the signal 1.24 ppm which included the methyl groups of rhamnose C of poly_1, rhamnose units A and A', and 2OMeVio4NAc units E and E', present in poly_2 (for labels description refer to Supplementary Table 1). The following formula has been used to determine the proportion between the two polysaccharides: 1 = Each protease digestion was performed treating the material with proteinase K (Sigma P6556-100 MG, CAS 39450-01-6), using 0.5 mg of the protease for 10 mg of sample in a digestion buffer (100 mM Tris, 50 mM NaCl, 10 mM MgCl2, pH 7.5) at 55°C O.N. Then the samples were dialyzed against water (cut-off 3500 Da) and dried. Yields and poly_2/poly_1 ratio for the different purifications are reported in Table 1.

Molecular weight determination
The molecular weight was determined by size exclusion chromatography, by using a TSK gel G5000 PWXL column (30 cm x 7.8 mm ID), eluted with 50 mM ammonium bicarbonate, at 0.8 ml/min at r.t., the eluate was monitored with a refractive index detector on a HPLC system Agilent 1100. The column was calibrated with dextrans standards of known molecular weight (5, 50, 150, 410, and 610 kDa), the logarithm of the molecular weight was fitted versus the elution volume with a linear regression and the molecular weight of the fibrils, poly_1 and poly_2 was evaluated by interpolation of the equation.

SDS-PAGE analysis
The 12.0% SDS PAGE (Figures 4b and 4c) used to detect carbohydrate material by alcian blue and silver staining [3] was performed on the fibrils (12 g), and on poly_1 (1, 4, 12 g) and poly_2 (1, 4, 12 g) with the highest purity. The BlueEye Prestained Protein marker (2 l, Bioscience) was loaded as molecular weight reference. A similar gel was stained with Comassie brilliant blue to evidence the protein content, here poly_1 and poly_2 were used at the maximum concentration (12 g each).

Mass Spectrometry-based proteomic analysis of Mimivirus fibrils, full virions and defibrillated virions
Extracted proteins were stacked in the top of a SDS-PAGE gel (4-12% NuPAGE, Life Technologies), stained with Coomassie blue R-250 (Bio-Rad) before in-gel digestion using modified trypsin (Promega, sequencing grade) as previously described. [10] Resulting peptides were analyzed by online nanoliquid chromatography coupled to tandem MS (UltiMate 3000 RSLCnano and Q-Exactive Plus, Thermo Scientific). Peptides were sampled on a 300 µm x 5 mm PepMap C18 precolumn and separated on a 75 µm x 250 mm C18 column (Reprosil-Pur 120 C18-AQ, 1.9 μm, Dr. Maisch) using a 120-min gradient. MS and MS/MS data were acquired using Xcalibur (Thermo Scientific). Peptides and proteins were identified using Mascot (version 2.6.0) through concomitant searches against Mimivirus database (homemade), classical contaminant database (homemade) and the corresponding reversed databases. The Proline software [11] was used to filter the results: conservation of rank 1 peptides, peptide score ≥ 25, peptide length ≥ 7, peptidespectrum-match identification false discovery rate < 1% as calculated on scores by employing the reverse database strategy, and minimum of 1 specific peptide per identified protein group. Proline was then used to perform a compilation, grouping and spectral counting-based comparison of the protein groups identified in the different samples. Proteins from the contaminant database were discarded from the final list of identified proteins. Extracted spectral counts for each protein in each sample were tested using ACD tool (URL: www.igs.cnrs-mrs.fr/acdtool/). [12] Only proteins with a p-value inferior to 0.001 were considered as enriched in one of the compared samples.

Conservation of fibrils proteins through three clades of the Mimiviridae family.
Tblastn (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to assess the presence, and the level of conservation (Table S2), of the restricted pool of 29 proteins versus all fully sequenced genomes of the members of the first three clades of the Mimiviridae family (Table S3). For each protein, the % of query coverage and identity is reported.

Determination of putative glycosylation sites for L894/L893, and L488 and their conservation in other members.
NetOGly 4.0 Server or NetNGlyc 1.0 available at DDTU Health Tech site (http://services.healthtech.dtu.dk) were used to identify the potential O-and N-glycosylation sites of: L894/L893, L488, and R710, and L236. Contrary to the others, L894/L893 and L488 have several glycosylation sites and for this reason, only these two proteins were further compared with their orthologs in the three clades by using multiple alignment based on structural information on the Expresso Server (http://tcoffee.crg.cat/apps/tcoffee/do:expresso) [13] to define which glycosylation sites were conserved. The obtained multiple alignment was submitted to ESPript server (http://espript.ibcp.fr/ESPript/ESPript/) [14] to visualize sequence similarities ( Figures S7  and S8, for L894/L893 and L488, respectively).   Overlay of HSQC (black/dark grey) and HMBC (light grey) spectra detailing: a) the anomeric and the carbinolic regions; b) the high field region containing the methyl signals, c) the D6 dens ities drawn by raising the contour levels, and d) the long range correlation connecting the methyl group of the pyruvate to the carboxylic carbon. Letters used for the annotation of the densities follow the system of Table S1. Signals in the dotted box are related to proteins and signals denoted with "*" refer to anomeric protons of minor monosaccharide motifs. Figure S4. 1 H-1 H Homonuclear spectra of Mimivirus fibrils. a) proton spectrum; b) overlay of TOCSY (black) and COSY (red and cyan) spectra; c) overlay of TOCSY (black) and NOESY (green) spectra. Letters used for the annotation of the densities follow the system of Table S1. *minor anomeric signals whose identity could not be established. Figure S5. Evaluation of poly_1 vs poly_2 ratio. The methyl at 1 H 1.47 ppm was used as reporter group of poly_1 and the group of signals at 1.24 ppm was related to poly_2 after subtraction of the contribution of poly_1. Integrations of the proton spectra were run for the all the fractions obtained by ion exchange chromatography and are reported next to each profile or in Table 1. a) untreated fibrils; b) fibrils digested with protease K; c) fibers digested twice with proteinase K. Figure S6. NMR data of purified poly_1 and poly_2. HSQC spectra of poly_2 and of poly_1 purified by ion exchange chromatography after the double proteinase K treatment; a) carbinolic region and b) anomeric region of purified poly_2; c) and d) the same for poly_1. Labels used to annotate the densities follow the system of Table S1, signals denoted with "*" refer to anomeric protons of minor monosaccharide motifs.    [15] The 3 underlined are described as occurring in the fibrils by Sobhy et al. [16] The text in gray denotes the 7 proteins that were poorly conserved between the clades. In bold are the best candidates to act as carrier of the polysaccharides (this work). Proteins are sorted according to the ratings calculated from the first proteomic data set (SourceData File_1). The full list of the genome sequences used is in Table S3.   Table S2. NCBI accession number of the complete genome sequences used in the tBlastn search to determine the conservation level (Supporting Table 2) of the proteins in the fibrils in the Acanthamoeba infecting Mimiviruses.