Brassica rapa hairy root based expression system leads to the production of highly homogenous and reproducible profiles of recombinant human alpha‐L‐iduronidase

Summary The Brassica rapa hairy root based expression platform, a turnip hairy root based expression system, is able to produce human complex glycoproteins such as the alpha—L—iduronidase (IDUA) with an activity similar to the one produced by Chinese Hamster Ovary (CHO) cells. In this article, a particular attention has been paid to the N‐ and O‐glycosylation that characterize the alpha‐L‐iduronidase produced using this hairy root based system. This analysis showed that the recombinant protein is characterized by highly homogeneous post translational profiles enabling a strong batch to batch reproducibility. Indeed, on each of the 6 N‐glycosylation sites of the IDUA, a single N‐glycan composed of a core Man3GlcNAc2 carrying one beta(1,2)‐xylose and one alpha(1,3)‐fucose epitope (M3XFGN2) was identified, highlighting the high homogeneity of the production system. Hydroxylation of proline residues and arabinosylation were identified during O‐glycosylation analysis, still with a remarkable reproducibility. This platform is thus positioned as an effective and consistent expression system for the production of human complex therapeutic proteins.


Introduction
To date, various recombinant protein expression systems have been developed. This can be explained by the constraints that each of these systems impose: inability to produce and/or secrete functional complex proteins (e.g.: bacterial systems), existence of a risk of viral transmission and toxic molecules (e.g.: bacterial systems, mammalian cells), societal rejection (e.g.: GMO plants in fields), or high production costs (e.g.: mammalian cells). Thus, overall, all of the existing production systems present some limitations.
Since more than 25 years, the transgenic plants have appeared as an alternative system for the production of heterologous recombinant therapeutic proteins offering a number of major advantages as compared to the usual production from industrial cell lines among which, in particular, the absence of potential contamination by animal pathogens and the possibility of mass production at low cost. However, the culture of transgenic plants in open fields or in greenhouses is associated with numerous limitations such as social rejection or environmental influence. This has led the scientific community to develop plant alternatives combining the intrinsic advantages of plants and a possible production confinement. In this context, the hairy root expression system appears as an obvious favourable candidate.
Hairy roots emerge from the wounding site of plantlets after the infection by a symbiotic bacterium called Rhizobium rhizogenes. In nature, this phenomenon is beneficial for the infected plants as it enables them to extract more soil nutrients and water. For its counterpart, the bacterium builds an environment favourable to its development. These additional not adventitious roots possess the quality to grow indefinitely and without geotropism.
Hairy roots have been widely studied and used for the production of specialized/secondary metabolites of industrial and pharmaceutical interest (Georgiev et al., 2007;Giri and Narasu, 2000;Guillon et al., 2006a,b;Srivastava and Srivastava, 2007). Since the 1990s, the production of recombinant proteins has been considered as another promising application of hairy root cultures. The first proof of concept was achieved by producing a mouse monoclonal antibody by hairy roots of tobacco plants (Wongsamuth and Doran, 1997). It was shown that this antibody was secreted and accumulated in the culture medium. Other recombinant proteins have been produced and secreted by tobacco hairy roots, including the green fluorescent protein (Medina-Bol ıvar and Cramer, 2004), the murine interleukin (Liu et al., 2009), the human acetylcholinesterase (Woods et al., 2008) or the thaumatin sweetener (Pham et al., 2012). In order to improve the hairy-root based expression system, a plant species from the Brassicaceae family was especially selected because it met certain criteria like use of an edible plant to avoid any risk of known toxicity and high heterologous protein secretion capacity by transformed roots (Huet et al., 2014). In addition, the hairy roots that emerge from Brassica rapa can grow indefinitely, which is not the case for the hairy roots that emerge from Nicotiana benthamiana for example (Huet et al., 2014). Using this system, complex recombinant proteins can be produced (Ele Ekouna et al., 2017). In the present article, the indepth characterization of one of such recombinant protein, referred to as 'rIDUA_RLT' (for recombinant alpha-L-iduronida-se_Root Lines Technology), is described.
The alpha-L-iduronidase (IDUA) is a complex human glycoprotein which deficiency leads to the development of the mucopolysaccharidosis type I (MPS I), a progressive lysosomal storage disorder. Alpha-L-iduronidase (IDUA; EC 3.2.1.76) is a 71 kDa lysosomal enzyme that hydrolyses the terminal alpha-Liduronic acid residues of the glycosaminoglycans, such as dermatan sulphate and heparan sulphate. IDUA is a secreted protein presenting a signal peptide ( 1 M-23 A) which is released in its final secreted form and six potential N-glycosylation sites as well as hydroxylation. Such protein have already being produced in plants (Acosta et al., 2015;He et al., 2012He et al., , 2013Pierce et al., 2017) and in CHO cells (Aldurazyme â , Sanofi, Paris, France; Kakkis et al., 1994). This gave the opportunity to compare the characteristics of the rIDUA_RLT protein produced in hairy root clones with its benchmark produced by using a mammalianbased production system or other production systems.
Here, in order to biochemically characterize the protein produced by this hairy roots based expression platform, we analysed its activity as well as its post-translational modifications. As most of the biopharmaceuticals are glycosylated proteins (Walsh and Jefferis, 2006) and as it is well-established that the glycosylation of these recombinant proteins are essential for their biological activity, safety, efficacy and immunogenicity (van Beers and Lingg et al., 2012), a particular attention was paid to the analysis of both the N-and O-glycosylation profiles of the protein produced using the hairy root-based expression system.

Results
Transformed turnip hairy root clones are able to reproducibly secrete an active rIDUA_RLT recombinant protein that can be purified using a customized process Turnip plantlets were infected with transformed R. rhizogenes containing the plasmid pRD400-SP-IDUA as described previously (Huet et al., 2014). One hundred and fifty-three hairy root clones emerged from the wounding sites. These clones were individualized and cultured first in solid and then in liquid media. At this stage, 39 hairy root clones were selected for their growth capacity through a phenotypic screen (2 cmlong roots or higher). Hairy root clones having stably integrated the human IDUA transgene were selected through a PCR analysis on hIDUA transcripts using specific primers for IDUA and the SEC61 gene, which is constitutively expressed in Brassica species [UniProtKB -P0DI74 (S61G1_ARATH)] (Figure 1a). After this selection step, the ability of the selected hairy root clones to produce and secrete an active recombinant a-L-iduronidase protein into the media was assessed. As shown in Figure 1b, productivity observed for clone 84 was outstanding as compared to the others. This best producing clone was finally selected for further analyses based on the activity assay screening. The immunodetection analysis of the culture medium from this selected hairy root clone using an anti-IDUA antibody shows a band at a molecular weight of nearly 80 kDa. According to the fact that the molecular weight of the mature non glycosylated protein is theoretically expected to be 70.77 kDa, this result confirmed the presence of the protein of interest in the culture medium under a glycosylated form (Figure 1c). This analysis was done using the crude culture medium harvested from several bioreactors cultured from the same hairy root clone (clone 84) showing a remarkable batch-to-batch reproducibility of the production of rIDUA_RLT protein by the selected clone ( Figure 1c). Finally, the identity of the protein of interest was further validated by a proteomic approach combined to MS 2 from crude culture medium. This analysis displayed a 68.5% peptide recovery as compared to the theoretical sequence of the expressed IDUA ( Figure 1d).
The rIDUA_RLT protein (Figure 2a) was then purified using a two chromatographic step protocol. A protein purity degree of over 96% (Figure 2b) without any soluble aggregate as estimated by SE-HPLC and RP-HPLC was obtained (Figure 2c; see Data S1).
The N-terminal sequence of rIDUA_RLT corresponds to the expected sequence, demonstrating that the signal peptide was efficiently removed For the recombinant protein to be secreted within the culture medium, a gene sequence encoding for a specific signal peptide already used in (Ele Ekouna et al., 2017;Huet et al., 2014) was added to the 5 0 end of the nucleotide sequence. This signal peptide has to be removed during the secretion step, in the endoplasmic reticulum as normally observed in eukaryotic cells (Dudek et al., 2015;Kober et al., 2013;Zimmermann et al., 2011). To validate the absence of the signal peptide in the protein secreted by the hairy roots based expression platform, a N-terminal sequencing using Edman degradation was performed (data not shown). A N-terminal MAXXV pentapeptide was identified with X being a modified amino acid. Considering that the expected N-terminal pentapeptide is MAPPV, this analysis suggested that cleavage of the signal peptide occurred as expected before the Met 24. This demonstrates the efficiency of the hairy root expression system to recognize and cleave successfully the signal peptide of heterologous proteins.
rIDUA_RLT protein exhibits enzymatic kinetics that are comparable with the ones displayed by the same enzyme produced in CHO cells The kinetic parameters of the purified rIDUA_RLT protein produced were measured. Michaelis-Menten kinetics were used to characterize the enzymatic properties of the rIDUA_RLT protein in comparison with the commercially available recombinant protein produced in CHO cells (Aldurazyme â , Sanofi). rIDUA_RLT displays a Vmax of 8.5 lmol/min/mg, a kcat of 11.8 s À1 and a Km of 120 lM, whereas Aldurazyme, analysed in parallel within the same experiment, exhibited a Vmax of 6.8 lmol/min/mg, a kcat of 9.4 s À1 and a Km of 130 lM. These results show that rIDUA_RLT and Aldurazyme have comparable enzymatic activity and affinities towards the same substrate ( Figure 3). The enzymatic activity being usually related to the protein quality and glycosylation, it was of interest to biochemically characterize the rIDUA_RLT protein and focus especially on its glycosylation profile.

N-Glycosylation analysis indicates that rIDUA_RLT protein displays a highly homogeneous paucimannosidic profile
The N-glycan profile of rIDUA_RLT has been determined by mass spectrometry analysis of oligosaccharides released from the purified recombinant enzyme (Figure 4). The only ion that can be assigned to glycan residues is the ion at m/z 704.81which corresponds to a N-glycan composed of a core Man 3 GlcNAc 2 carrying one beta(1,2)-xylose and one alpha(1,3)-fucose epitope (M 3 XFGN 2 ), a N-glycan widely described in plants ( Bardor, 2008;    hIDUA produced in Brassica rapa hairy root 507 Brooks, 2011;Schoberer and Strasser, 2017). The other ions correspond to non-glycan contaminants. A unique paucimannosidic N-glycan was thus detected. These results were also confirmed using western blot against beta(1,2)-xylose and alpha (1,3)-fucose (data not shown). In order to determine the distribution of this oligosaccharide on each N-glycosylation site of the recombinant rIDUA_RLT secreted by the hairy root clones, a glycoproteomic approach combined to nano LC-nanoESI-MS was used as previously described (Vanier et al., 2015). This has made it possible to access both to the N-glycan structures and to their site-occupancy. For that, LC peaks giving MS 2 spectra exhibiting N-glycan diagnostic fragment ions at m/z 204 (Nacetylglucosamine), 163 (mannose) and 366 (Man-GlcNAc) were assigned to glycopeptides ( Figure 5a). Using such assignment and as illustrated in Figure 5a, M 3 XFGN 2 N-glycan was identified attached to each of the 6 N-glycosylation sites. Sequences of glycopeptides were then confirmed by analysis of the fragmentation patterns obtained by MS 2 (Figure 5b). The Figure 5 exemplifies the results obtained from the analysis of the rIDUA_RLT produced in one bioreactor batch.
A deglycosylation experiment definitively confirmed that all Nglycosylation sites of the rIDUA_RLT protein are occupied. Indeed, non-glycosylated peptides containing an Asn residue were not detected in the rIDUA_RLT which indicates that the six sites are fully N-glycosylated. Thus, this result indicates that this hairy root based platform is able to N-glycosylate efficiently the complex recombinant proteins.

O-glycosylation analysis of rIDUA_RLT produced in the hairy root based expression platform
Like N-linked glycans, O-linked glycans play an important role in protein stability and function (Kim et al., 2016;Zhang et al., 2013). In order to further characterize the glycosylation profile of the rIDUA_RLT protein produced, an analysis of its O-glycosylation profile was performed.
As mentioned above, the N-terminal sequencing using Edman degradation displayed a MAXXV sequence at the N-terminal end of the rIDUA_RLT protein. It thus appears that the X residues in the rIDUA_RLT protein may correspond to proline residues that experienced post-translational modifications. To investigate which modifications may occur on the proline residues at the N-terminal extremity of the rIDUA_RLT protein, a nano LC-nanoESI-MS analysis of the recombinant rIDUA_RLT was carried out as described above after tryptic digestion of the recombinant protein. Native N-terminal MAPPVAPAEAPHLVHVDAAR tryptic peptide was not found. However, N-terminal peptides were detected with m/z shifts of 16, 32 and 48 mass units. The sequences of these N-terminal peptides were investigated by MS 2 sequencing. MS 2 spectrum of the double charged ion corresponding to the N-terminal peptide exhibiting a m/z shift of 32 mass units is presented in Figure 6. The fragmentation pattern indicated that the m/z shift is carried out by the N-terminal MAPPV amino acid sequence. Considering the absence of detection of proline residues in the N-terminal Edman degradation, we thus conclude that these two proline residues are hydroxylylated in the rIDUA_RLT produced in B. rapa hairy roots. As well, peptides with m/z shifts of 16 and 48 mass units were assigned to N-terminal sequences containing one and three hydroxyproline residues, respectively. Based on the ion intensity, the relative percentage of the N-terminal peptide bearing one hydroxyproline is about 40%, the one of the peptide bearing two hydroxyproline residues represent almost 60% and N-terminal peptide containing 3 hydroxyproline residues represent about 1%.
In addition to m/z shifts resulting from the hydroxylation of proline residues, N-terminal MAPPVAPAEAPHLVHVDAAR tryptic peptides with m/z shifts of 132 or 264 and representing about 1% of the overall peptide population were also detected. These peptides may result from the transfer of pentose residues onto hydroxyproline residues as observed in plants (Schoberer and Strasser, 2017). To determine which pentose may be linked to hydroxyproline residues, a monosaccharide composition of rIDUA_RLT was determined by gas chromatography analysis (Table 1). Xylose, fucose, GlcNAc and mannose were detected which is consistent with the N-glycosylation profile of this protein as depicted in Figures 4 and 5. In addition, arabinose residues, and no other pentose, were also detected suggesting that the pentose O-linked to hydroxyproline is indeed arabinose (Table 1). Glucose and galactose are contaminant residues.
A second peptide containing a PPXP sequence (peptide STGFCPPLPHSQADQYVLSWDQQLNLAYVGAVPHR) was identified as being hydroxylated. One to three hydroxylations were detected on this peptide by LC-ESI-MS. In addition to mono-, di-and trihydroxylated forms of the peptide, a tetrahydroxylated variant was also detected by nanoLC-nanoESI-MS.

N/O-glycosylation homogeneity and reproducibility of rIDUA_RLT
Using the same analytical strategy, searches for other posttranslational modifications, such as phosphorylation of hydroxy aminoacids, were unsuccessful. In addition, similar results were obtained on the analysis of the N-and O-glycosylation of the rIDUA_RLT protein produced at 6-month intervals in two independent bioreactor batches (data not shown). On each of the 6 N-glycosylation sites of both rIDUA_RLT proteins produced using the two independent bioreactor batches, a single N-glycan composed of a core Man 3 GlcNAc 2 carrying one beta (1,2)-xylose and one alpha(1,3)-fucose epitope (M3XFGN2) was identified. Moreover, the same hydroxylation and arabinosylation modifications were identified on both rIDUA_RLT during O-glycosylation analysis. These results indicate that expression of rIDUA_RLT in hairy roots is robust and reproducible not only in term of protein productivity (Figure 1c) but also in term of N-and O-glycosylation profiles.

Discussion
In the present work, the ability of the B. rapa hairy root based expression platform to produce and secrete a complex recombinant glycoprotein in its active form was demonstrated taking as an example the rIDUA_RLT protein. This highlights the relevance of the platform as an expression system. It was shown that the hairy-root based rIDUA_RLT protein shows enzymatic characteristics similar to the ones of the same recombinant protein produced in CHO (Aldurazyme) despite the differences that exist between both expression systems in terms of post-translational modifications.
The N-glycosylation analysis of rIDUA_RLT showed that the recombinant protein displays a single paucimannose profile on all N-glycosylation sites. Such a homogeneity is remarkable because in most expression systems commonly used for the production of biopharmaceuticals, N-glycosylation of recombinant proteins carry multiple N-glycan structures resulting from the variability of N-glycan maturation occurring in the Golgi apparatus. This results in recombinant therapeutic proteins exhibiting high heterogeneity which may affect the batch-to-batch reproducibility (Hossler et al., 2009;Lingg et al., 2012). As an example, Aldurazyme, the rIDUA protein produced in CHO cells, is characterized by a high intra-site heterogeneity of the N-linked glycans and this was observed in each of the six N-glycosylation sites (Zhao et al., 1997). Indeed, the site specific glycosylation pattern that characterize Aldurazyme is: Asn-110, complex type glycans; Asn-190, complex type glycans; Asn-336, bisphosphorylated oligomannosidic glycan (P2Man7GlcNAc2); Asn-372, high mannose type glycans (mainly Man9GlcNAc2, some of which are monoglucosylated); Asn-415, mixed oligomannosidic and complex type glycans; Asn-451, bisphosphorylated oligomannosidic glycan (P2Man7GlcNAc2; Zhao et al., 1997); which contrasts with the single paucimmanose profile found in all N-glycosylation sites of rIDUA_RLT. Similarly, the recombinant IDUA proteins produced in other plant-based expression systems than the hairy roots are characterized by a heterogeneity of the N-glycosylation profiles with nevertheless a predominance of high mannose profiles (He et al., 2012(He et al., , 2013. The absence of such a large heterogeneity when analysing the N-glycosylation profiles of the rIDUA_RLT produced using the B. rapa hairy root system is of particular interest at a regulatory level to increase the reproducibility of the batches that may be used in clinical trials. Such observation was also made when analysing total endogenous proteins of B. rapa hairy root clones developed using the hairy root platform. As an example, total endogenous proteins from isolated young or old roots collected at different time-points of the culture of hairy root clones expressing the glucocerebrosidase (GCD) recombinant protein still essentially display profiles of paucimannosidic type (see Data S1 and Figure S1) when analysed by mass spectrometry, reinforcing our observation. High-mannose N-glycans were not detected in the oligosaccharide profile. We postulate that B. rapa Golgi a-mannosidases are likely highly efficient in the processing of high-mannose N-glycans arising from Endoplasmic Reticulum biosynthesis steps into mature glycans. These additional observations reinforce the demonstration of the ability of the B. rapa hairy root platform to produce recombinant proteins with a remarkable homogeneous glycosylation profile, never observed in the recombinant proteins produced in CHO (Tekoah et al., 2013;Zhao et al., 1997).
As in mammals, plants can also produce proteins displaying O-glycosylation profiles (Tekoah, 2006). However, the O-glycosylation profiles of the recombinant proteins produced from plant-based expression systems were only poorly described in the literature (Karnoup et al., 2005;Kim et al., 2016;Schoberer and Strasser, 2017). O-linked glycosylation can be found in amino acids that contain a hydroxyl group (i.e. serine, threonine,  (Schoberer and Strasser, 2017). In our study, hydroxylation and O-glycosylation of proline residues have been highlighted on rIDUA_RLT. Such hydroxylation was already observed in the recombinant IDUA protein produced in CHO (Jung et al., 2013). However, to the best of our knowledge, no analysis of the O-glycan profiles associated with this protein was further investigated although this mechanism is indeed present in mammals. O-glycosylation has also been described in plants, where it converts the proline residues to hydroxyproline and attaches arabinose residues to recombinant proteins (Karnoup et al., 2005;Pinkhasov et al., 2011). In the monosaccharide composition of rIDUA_RLT, in addition to xylose of N-glycans, only one pentose, arabinose, was observed suggesting that the m/z shifts of 132 of hydroxyprolinecontaining peptides are indeed due to arabinose attachment.
Finally, the rIDUA_RLT presents a similar glycan profile as the one classically observed in proteins produced by other plantbased expression systems, in particular the well-characterized beta(1,2)-xylose and one alpha(1,3)-fucose epitopes. These residues have been criticized likely as being immunogenic (Bardor et al., 2003). However, this observation is still controversial and depends on the studied model. Elelyso, a recombinant GCD produced using carrot cells by Protalix (Carmiel, Israel) and used in clinic since 2012 is a good example of the non-toxicity of such glycoepitopes. Indeed, Elelyso is characterized by the presence of the plant-specific residues, i.e. beta(1,2)-xylose and alpha(1,3)fucose. In addition, signals consistent with the presence of arabinose were detected in the monosaccharide composition of the carrot cells-produced GCD (Shaaltiel et al., 2015). Nevertheless, no adverse effect was observed that could be linked to such particular residues in all patients treated so far using this therapeutic product, neither during the clinical trials, nor since its approval by the regulatory authorities. This plant-based protein appeared as safe as the treatment of the patients using the counterpart protein produced in CHO (Cerezyme â /Sanofi) or human fibroblast cells (VPRIV â /Shire Dublin, Ireland; Shaaltiel and Tekoah, 2016). The glycosylation profile of the recombinant proteins produced using the B. rapa hairy root platform could be thus compatible with a therapeutic use of such proteins.
Finally, thanks to the highly homogeneous paucimannosidic profile of its recombinant proteins, the B. rapa hairy root based expression platform is of particular relevance for the production of proteins of therapeutic interest such as the GCD for the treatment of Gaucher disease or the alpha galactosidase for the treatment of the patients with Fabry disease. Regarding the treatment of other lysosomal disorders, the addition of mannose-6-phosphate (M6P) residues would be ideally required in vivo as the plants are not naturally able to phosphorylate the mannose residues. Several strategies are described in the literature allowing the addition of such residues on plant-based recombinant proteins (He et al., 2012(He et al., , 2013. The existence of alternate M6P-independent pathways for lysosomal enzyme sorting has also been largely described (Markmann et al., 2015). As an example, based on Kakkis results (Kakkis et al., 1994), 20% of the IDUA protein is able to penetrate the targeted cell through an alternative pathway to the wellcharacterized M6P recognition system. This independent glycosylation pathway can be related to the usage of alternatives routes enabling recombinant proteins to penetrate the targeted cells such as the use of HIV Tat peptides (Xia et al., 2001;Zhang et al., 2008), insulin growth factor II (LeBowitz et al., 2004), receptor associated protein RAP (Prince et al., 2004), insulin receptor (Boado et al., 2008, Lu et al., 2011, intercellular adhesion molecule 1 (ICAM-1; Hsu et al., 2011, Hsu et al., 2012, Muro et al., 2006, transferrin receptor (Tfr; Chen et al., 2008, Osborn et al., 2008, RTB lectin (Acosta et al., 2015).
As an example, an a-glucosidase (GAA) recombinant protein, enzyme involved in Pompe disease, coupled to polymer nanocarriers coated with an antibody specific to ICAM-1, allowed the efficient internalization and lysosomal transport of GAA into target cells enhancing substrate degradation (Hsu et al., 2012). In the same way, a galactocerebrosidase tagged with HIV Tat protein facilitated its uptake into neurons using M6P independent pathway . More recently, Sonoda et al. (2018) showed that the recombinant iduronate-2-sulphatase fused with an anti-TfR antibody was able to be uptaken by fibroblastes through both the TfR and the M6P pathways. All  these examples show that several alternative strategies can be used to enhance the uptake of recombinant lysosomal enzymes, including alpha-L-Iduronidase, even if the N-glycan phosphorylation is lacking in these recombinant proteins.

Materials
Escherichia coli strain JM101 and R. rhizogenes strain ICPB TR7 were used for cloning and plant transformation, respectively, and B. rapa rapa cv 'Navet des vertus marteau' for hairy root production. Plant tissue culture media, vitamins and sucrose came from Duchefa Biochemie. 4-methylumbelliferyl-a-L-Iduronide (4MU-I) came from Santa Cruz Biotechnology (Dallas, TX). The commercial recombinant IDUA protein used as positive control came from Antibodies-online. The anti-IDUA antibody used in the Western-blot analyses came from Antibodies-online. All reagents used to study the post-translational modifications of the IDUA protein were of HPLC grade. Peptide N-Glycosidase A was purchased from Roche Mannheim, Germany. All reagents used for SDS-PAGE silver staining, -Methylumbelliferone (4MU) and Concanavalin A were purchased from Sigma (Saint-Louis, MO). Antibodies directed against the anti-xylose and anti-fucose epitopes were from Agrisera V€ ann€ as, Sweden.

Molecular cloning
Gene synthesis of the human Iduronidase A (IDUA) (NCBI NP_ 000194.2) coding sequence was performed by GeneART (ThermoFisher Scientific, Regensburg, Germany) including the Tobacco Mosaic Virus (TMV) omega translational enhancer, the signal peptide (SP) coding sequence from the Arabidopsis At1g69940pme gene and the HindIII and EcoRI restriction sites for easy subcloning into the previously described pJIT163 plasmid (Guerineau, 1995). The expression cassette containing the omega translational enhancer, the SP and the IDUA sequence was cloned into HindIII/EcoRI restriction sites of the binary plant expression vector pRD400 (Datia et al., 1992) containing an upstream 35S Cauliflower Mosaic Virus (CaMV) promoter and a downstream CaMV 35S terminator. The pRD400 vector was then inserted in R. rhizogenes bacteria.

Plant transformation and hairy root culture
Turnip plants were transformed as described in (Huet et al., 2014). Briefly, after 10 days growth, the elongated stems originating from the culture of Turnip (B. rapa L. var. rapa cv. des vertus marteau) seeds were infected using the R. rhizogenes prepared as described above. The roots emerging from the infection sites were individualized and placed on medium B5 Gamborg (Gamborg et al., 1968). Isolated root lines were screened according to their growth capacity and their ability to produce the rIDUA_RLT protein by Western-blot analysis. The culture was carried out as described previously (Ele Ekouna et al.,

2017)
. For pilot scale production, root clones were cultured in 25 L airlift bioreactors.

Western blot
Samples (crude culture media) were resolved in AnykD mini protean TGX polyacrylamide gels (Bio-Rad). For Western blot analysis, proteins were transferred to nitrocellulose membranes (Bio-Rad, Hercules, California) using the Bio-Rad Turbo Trans-Blot system. The membranes were blocked in 5% fat-free milk (Blotting grade blocker, Bio-Rad) in TBS buffer, incubated with a 1:1000 dilution of the mouse anti-a-L-Iduronidase (ABIN603316 from Antibodies-online) followed by a 1:5000 dilution of a goat anti-mouse IgG-HRP antibody (sc-2005 from Santa Cruz Biotechnology). Staining was developed using Western Clarity ECL revelation kit (170-5060, Bio-Rad).

Silver staining of proteins in polyacrylamide gels
Samples (crude culture media) or purified protein were resolved in AnykD mini protean TGX polyacrylamide gels (BioRad). For silver stain, gels were incubated in 50% ethanol/10% acetic acid during 30 min, then in 30% ethanol/1% acetic acid for 15 min. Gels were washed in ultrapure H 2 O with shaking, three times for 10 min. Gels were sensitized during 90 s in 0.02% sodium thiosulphate, 6.8% sodium acetate then rinsed in H 2 O. Gels were incubated in 0.1% silver nitrate for 10 min. Gels were incubated in 3% sodium carbonate, 0.0002% sodium thiosulphate and 0.025% formaldehyde with shaking during 1 min. Revelation were stopped with a 40 mM EDTA-Na 2 solution for 5 min.

MS analysis
For nanoLC/MS/MS analysis, samples were prepared as described in (Allmann et al., 2014). The peptides were analysed on an Ultimate 3000 RSLC Nano-UPHLC system (Thermo Scientific, Waltham, Massachusetts) coupled to a nanospray Q-Exactive hybrid quadrupole-Orbitrap mass spectrometer (Thermo Scientific). Ten microliters of each peptide extract were loaded onto a 300 lm ID 9 5 mm PepMap â C18 pre-column (Thermo Scientific) at a flow rate of 20 lL/min. After 5 min desalting, peptides were separated on a 75 lm ID 9 25 cm C18 Acclaim PepMap â RSLC column (Thermo Scientific) with a 4%-40% linear gradient of solvent B (0.1% formic acid in 80% CH 3 CN) in 108 min. The separation flow rate was set at 300 nL/min. The mass spectrometer was operated in positive ion mode at a needle voltage of 1.8 kV. Data were acquired using Xcalibur 3.1 software in a data-dependent mode. MS scans (m/z 350-1600) were recorded at a resolution of R = 70 000 (@ m/z 200) and an AGC target of 3 9 10 6 ions was collected within 100 ms. Dynamic exclusion was set at 30 s and the top 12 ions were selected from fragmentation in HCD mode. MS/MS scans with a target value of 1 9 10 5 ions were collected with a maximum fill time of 100 ms and a resolution of R = 17 500. Additionally, only +2 and +3 charged ions were selected for fragmentation. website and the sequence of the recombinant protein. Two missed enzyme cleavages were allowed. Mass tolerances in MS and MS/MS were set at 10 ppm and 0.02 Da. Peptide validation was performed using the Percolator algorithm (Kall et al., 2007) and only high confidence peptides were retained corresponding to a 1% False Positive Rate at peptide level.

rIDUA_RLT purification
Culture medium was collected, clarified by centrifugation at 9000 g for 20 min at 4°C and filtered through two successive steps using a 0.8-0.45 lm and a 0.45-0.2 lm filters (Sartopore 2 Midicap). The fraction was applied on the strong cation exchanger chromatography Eshmuno S from Millipore equilibrated with sodium acetate 100 mM, urea 1.5 M, pH 5.0 followed by the same buffer without urea 1.5 M. The elution step was performed at 25 mS/cm with 20% (v/v) sodium acetate 100 mM, NaCl 1 M pH 5.0 followed by a step at 34 mS/cm with 30% (v/v) of the same buffer. The fractions containing the rIDUA_RLT protein were collected and applied on a hydrophobic interaction chromatography (HIC) resin Toyopearl Phenyl 650 M from Tosoh equilibrated with 20 mM sodium phosphate, pH 7.0 with and without 3 M NaCl. The first step of elution was performed with 30% (v/v) of 20 mM sodium phosphate containing 3 M NaCl pH 7.0. A gradient from 35% to 100% of sodium phosphate 20 mM, NaCl 3 M pH7.0 was then applied in order to recover the rIDUA_RLT protein. The pooled fractions containing rIDUA_RLT were concentrated using a membrane cut-off of 30 kDa (Sartocon cassette PES -Sartorius, G€ ottingen, Germany) and submitted to a diafiltration using 10 volumes of buffer NaH 2 PO 4 .2H 2 O 92 mM, Na 2 HPO 4 .12H 2 O 8 mM, NaCl 150 mM, pH 6.0 to achieve a protein concentration of 0.58 mg/L. Tween 80 was finally added to achieve a final concentration of 10 mg/L. The material collected from the formulation step was submitted to a 0.2 lm filtration using a Minisart PES filter, aliquoted and stored at À20°C. Quality of the purified material and concentration of the pure enzyme was determined by SDS-PAGE, western-blot, SE-HPLC and RP-HPLC.

SE-HPLC analysis
The purity level of IDUA protein as well as the presence of soluble aggregates were analysed using a SE-HPLC method performed on a HPLC Agilent system (HPLC Agilent Series 1200 Infinity) equipped with a refrigerated autosampler, a quaternary pump, a column heater and a DAD detector. A volume of 100 lL of sample was injected onto a Superdex 200 Increase 10/300 column (GE). The mobile phase was PBS, pH 7.2 buffer (NaCl, 8 g/L, KCl 0.2 g/L, Na 2 HPO 4 .12 H 2 O 3.58 g/L, KH 2 P0 4 0.24 g/L). The flow rate was 0.5 mL/min. The SE-HPLC profile was recorded at 280 nm.

Determination of the human a-L-Iduronidase activity
Turnip hairy root culture media coming from transformed hairy root cultures as well as the purified rIDUA_RLT protein were used to determine the activity of the recombinant protein of interest by using the fluorogenic substrate sodium 4-methylumbelliferyl-a-L-Iduronide (4MU-I; Santa Cruz Biotechnology) as described in (Ou et al., 2014). The 4MUI substrate was diluted to a working solution of 400 lM 4MU-I with the reaction buffer 0.4 M sodium formate, pH 3.5. Twenty-five lL of sample were added to 25 lL of 400 lM 4MU-I substrate. The mixture was incubated at 37°C for 30 min and 200 lL glycine carbonate buffer (pH 9.8) was added to quench the reaction. 4-Methylumbelliferone (4MU) (Sigma) was used to prepare the standard calibration curve. Fluorescence was measured using a plate reader (TECAN Infinite M1000, M€ annedorf, Switzerland) with excitation at 355 nm and emission at 460 nm. IDUA enzyme activity was expressed in units (lmol converted to product per minute) per sample volume (millilitres). The parameters K M , k cat and V max were calculated by linear fit on a Lineweaver-Burk plot (Ou et al., 2014).

Determination of the N-Terminal sequence
The rIDUA_RLT produced and purified from hairy roots culture medium was separated by an 8% SDS-PAGE, and then transferred to a PVDF membrane using the ProSorb system from Applied Biosystems. N-terminal sequence was then determined automatically by Edman degradation with the Procise P494 (Applied Biosystem, Foster City, California).

Analysis of the glycosylation of the purified rIDUA_RLT
Identification of the total N-glycan profile of rIDUA_RLT rIDUA_RLT was digested with proteases prior to a deglycosylation using the PNGase A as previously described (Ba€ ıet et al., 2011, Mathieu-Rivet et al., 2013. Released N-glycans were then purified over C18 and PGC pre-packed columns (Bakker et al., 2001, Ho et al., 2012. Finally, labelling of the purified N-glycans to procainamide (proc) was carried out by reducing amination according to the manufacturer instructions (Ludger LTD, Culham, Oxfordshire, UK LudgerTag Procainamide Glycan Labeling Kit). Procainamide derivatized N-glycans were then analysed by nanoLC coupled to nanoESI-MS using the nano-LC1200 system coupled to a QTOF 6520 mass spectrometer equipped with a nanospray source and a LC-Chip Cube interface (Agilent Technologies, Santa Clara, California).

Sample preparation
Purified rIDUA_RLT was separated on a NuPAGE Bis-Tris gel electrophoresis. Band corresponding to rIDUA_RLT was excised from the gel and cut into pieces. Gel pieces were washed several times with a solution mixture composed of 0.1 M NH 4 HCO 3 pH 8 and 100% CH 3 CN (v: v). Samples were dried down in a SpeedVac centrifuge (Thermo Fisher). After a reduction step with 0.1 M dithiothreitol (DTT) for 45 min at 56°C and alkylation with 55 mM iodoacetamide (IAA) for 30 min at room temperature in the dark, proteomic-grade trypsin was added (1 lg per protein band; Promega) and placed at 4°C during 45 min prior to an overnight incubation at 37°C. Then, rIDUA_RLT was eventually submitted to an additional chymotrypsin digestion. After protease digestion, gel pieces were incubated subsequently in a 50% CH 3 CN solution, 5% formic acid solution, 0.1 M NH 4 HCO 3 , 100% CH 3 CN and finally in 5% formic acid to extract the resulting peptide and glycopeptide mixture.
Protein identification and site-specific distribution of N-and O-glycans MS analyses were performed using the nano-LC1200 system coupled to a QTOF 6520 mass spectrometer equipped with a nanospray source and a LC-Chip Cube interface (Agilent Technologies). Briefly, peptide and glycopeptide mixture was enriched and desalted on a 360 nL RP-C18 trap column and separated on a Polaris (3-lm particle size) C18 column (150 mm long 9 75 lm inner diameter; Agilent Technologies). A 33-min linear gradient (3%-75% acetonitrile in 0.1% formic acid) at a flow rate of 320 nL/min was used, and separated peptides were analysed with the QTOF mass spectrometer. Full auto MS scans from 290 to 1700 m/z and auto MS 2 from 59 to 1700 m/z were recorded. In every cycle, a maximum of 5 precursors sorted by charge state (2 + preferred and single-charged ions excluded) were isolated and fragmented in the collision cell. Collision cell energy was automatically adjusted depending on the m/z. Scan speed raise based on precursor abundance (target 25 000 counts/spectrum) and precursors sorted only by abundance. Active exclusion of these precursors was enabled after three spectra within 1.5 min, and the threshold for precursor selection was set to 1000 counts. Glycopeptides were selected in the LC profile by selecting MS 2 spectra exhibiting N-glycan diagnostic fragment ions at m/z 204 and 163. Mass hunter qualitative analysis version B.07 (Agilent Technologies) was used to analyse the spectra.
In order to evaluate the site occupancy, IDUA peptides and glycopeptides mixture was deglycosylated by using the peptide Nglycosidase A (PNGase A; Roche diagnostic) according to (Vanier et al., 2015).

Monosaccharide composition using gas chromatography
The monosaccharide composition was determined by gas chromatography using 100 nmol of inositol as internal standard. rIDUA_RLT was hydrolysed during 2 h in 2 M TFA (trifluoroacetic acid) at 110°C. Freeze-dried samples were methanolysed with dry 1 M methanolic-HCl (Supelco) for 16 h at 80°C and then were dried under a stream of nitrogen and washed twice with methanol. Re-N-acetylation step was performed in methanol/ pyridine/acetic anhydride 4:1:1 (Sigma) for 1 h at 110°C. The resulting methyl sugars were treated with HDMS (hexamethyldisilazane)/TMCS (trimethylchlorosilane)/pyridine solution (3:1:9, Supelco), for 20 min at 110°C. Trimethylsilylated monosaccharides were dried and dissolved in 1 mL of cyclohexane. One ll of sample was separated by gas chromatography on a 0.25 mm 9 25 m silica capillary column of CP-Sil 5 CB with helium as a carrier gas and detected with a flame ionization detector. Neutral sugar standards (L-arabinose, L-fucose, D-galactose, D-mannose, D-galacturonic acid, L-rhamnose, D-glucose, D-GlcNAc, and D-xylose) were processed in parallel to the samples. Assignment of sample peaks was carried out by comparison of their retention times with those of standards treated and analysed in parallel.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Figure S1 ESI mass spectrum of N-glycans isolated from endogenous proteins of old roots collected at day 24.
Data S1 Methodological details for the analysis of the glycosylation of endogenous protein as well as the analysis of the purity of the protein of interest.