Design of a stable human acid‐β‐glucosidase: towards improved Gaucher disease therapy and mutation classification

Acid‐β‐glucosidase (GCase, EC3.2.1.45 ), the lysosomal enzyme which hydrolyzes the simple glycosphingolipid, glucosylceramide (GlcCer), is encoded by the GBA1 gene. Biallelic mutations in GBA1 cause the human inherited metabolic disorder, Gaucher disease (GD), in which GlcCer accumulates, while heterozygous GBA1 mutations are the highest genetic risk factor for Parkinson's disease (PD). Recombinant GCase (e.g., Cerezyme®) is produced for use in enzyme replacement therapy for GD and is largely successful in relieving disease symptoms, except for the neurological symptoms observed in a subset of patients. As a first step toward developing an alternative to the recombinant human enzymes used to treat GD, we applied the PROSS stability‐design algorithm to generate GCase variants with enhanced stability. One of the designs, containing 55 mutations compared to wild‐type human GCase, exhibits improved secretion and thermal stability. Furthermore, the design has higher enzymatic activity than the clinically used human enzyme when incorporated into an AAV vector, resulting in a larger decrease in the accumulation of lipid substrates in cultured cells. Based on stability‐design calculations, we also developed a machine learning‐based approach to distinguish benign from deleterious (i.e., disease‐causing) GBA1 mutations. This approach gave remarkably accurate predictions of the enzymatic activity of single‐nucleotide polymorphisms in the GBA1 gene that are not currently associated with GD or PD. This latter approach could be applied to other diseases to determine risk factors in patients carrying rare mutations.

Acid-b-glucosidase (GCase, EC3.2.1.45), the lysosomal enzyme which hydrolyzes the simple glycosphingolipid, glucosylceramide (GlcCer), is encoded by the GBA1 gene. Biallelic mutations in GBA1 cause the human inherited metabolic disorder, Gaucher disease (GD), in which GlcCer accumulates, while heterozygous GBA1 mutations are the highest genetic risk factor for Parkinson's disease (PD). Recombinant GCase (e.g., Cerezyme Ò ) is produced for use in enzyme replacement therapy for GD and is largely successful in relieving disease symptoms, except for the neurological symptoms observed in a subset of patients. As a first step toward developing an alternative to the recombinant human enzymes used to treat GD, we applied the PROSS stability-design algorithm to generate GCase variants with enhanced stability. One of the designs, containing 55 mutations compared to wild-type human GCase, exhibits improved secretion and thermal stability. Furthermore, the design has higher enzymatic activity than the clinically used human enzyme when incorporated into an AAV vector, resulting in a larger decrease in the accumulation of lipid substrates in cultured cells. Based on stability-design calculations, we also developed a machine learning-based approach to distinguish benign from deleterious (i.e., disease-causing) GBA1 mutations. This approach gave remarkably accurate predictions of the enzymatic activity of single-nucleotide polymorphisms in the GBA1 gene that are not currently associated with GD or PD. This latter approach could be applied to other diseases to determine risk factors in patients carrying rare mutations.
Two approved treatments for GD are currently available, namely enzyme replacement therapy (ERT) and substrate reduction therapy (SRT). Patients treated by ERT receive periodic intravenous infusions of a recombinantly expressed GCase, of which Cerezyme Ò is the most widely used, while SRT uses inhibitors of GlcCer synthesis, thereby reducing its accumulation. However, neither ERT nor SRT can be currently used to treat nGD [6,7]. Similar to other neurological diseases, gene therapy offers an attractive option for the treatment of nGD. Gene delivery mediated by adeno-associated viruses (AAVs) [8] has the advantage of low immunogenicity, high efficiency, and the possibility of targeting specific tissues or cell types, including neurons [9,10]. AAV gene therapy is safe and efficient in mouse models of LSDs, including GD [11], and has been used in preclinical trials on human patients with LSDs (see Ref. [10] and references therein).
GCase comprises 497 amino acids and contains two disulfide bridges and five glycosylation sites, four of which are usually occupied [12]. Despite the success of ERT using recombinant GCase for the treatment of type 1 GD, no attempts have been made to optimize treatment strategies using, for instance, more stable forms of GCase or of other enzymes used in ERT in other LSDs. If such stabilized enzymes were available, they might remain active for longer times, reducing infusion frequency and enhancing therapeutic outcomes and economic benefit.
Due to the marginal stability of many proteins [13], protein engineering is frequently used to improve protein stability, although not, so far, for the enzymes used in ERT or in gene therapy for rare metabolic diseases. One approach to stabilize proteins is the use of computer-based algorithms, such as PROSS, which combine atomistic Rosetta design calculations and phylogenetic sequence analysis to design stable variants [14,15]. PROSS has been successfully applied to many proteins, including those that have several disulfide bonds and glycosylation sites [14,[16][17][18]. PROSS designs often exhibit higher recombinant expression levels and increased thermal stability while maintaining activity.
Protein destabilization or loss of expression caused by missense mutations can lead to a range of human diseases [19], and the ability to be able to predict the pathogenicity of missense mutations is highly desirable. A number of in silico tools for predicting functional and structural consequences of missense mutations are available, with most using sequence and conservation-based methods, protein sequence and structure, or supervised learning methods [20]. Nevertheless, such predictions often disagree, raising questions about their reliability. By way of example, results obtained with seven available in silico algorithms using a dataset of 97 nonsynonymous single-nucleotide polymorphisms (nsSNPs) in GBA1 [21] suggested that 22 should result in GD. However, the limitations of this study can be appreciated since only six of the algorithms recognized L444P, and only three identified N370S as disease-causing mutations, even though they are the two most prominent mutations associated with GD. A more useful approach might be to train an algorithm based on mutations with known pathologies, that is, benign and disease-causing, such as was done successfully for the MLH1 variant in Lynch syndrome [22]. For such an approach to be effective, a sufficient number of known mutations, both benign and diseasecausing, should be available.
In the present study, we use the PROSS algorithm to generate a more stable form of GCase. Notably, one of the GCase designs is secreted at a higher level, and upon transduction into neuroblastoma cells using an AAV vector, results in more effective clearance of GlcCer compared with WT GCase. Based on these results, we hypothesized that PROSS could enrich data from clinical studies to train an algorithm to predict the clinical severity of various mutations. We verified these predictions experimentally and by analysis of published clinical data. We conclude that the PROSSdesigned GCase may help improve the efficacy of ERT or of gene therapy (at least in the brain, which is an immune-privileged site). Furthermore, predictions of the clinical outcome of additional GBA1 mutations could be used for diagnostic purposes, with particular relevance to novel GBA1 mutations in Parkinson's disease (PD), in which GBA1 mutations are the highest genetic risk factor [23,24].

A stabilized GCase design for recombinant expression
We used PROSS to design GCase variants based on its crystal structure (PDB: 3gxi [25]), while not permitting design calculations within the active site pocket. Designs dGCase1, dGCase2, and dGCase3 containing 35, 45, and 55 amino acid substitutions, respectively, were expressed in mammalian HEK293T cells, along with WT human GCase (plasmids are shown in Fig. 1). The proteins were purified from growth media by one-step affinity chromatography using the Twin-Strep tag. Designs were screened for enzymatic activity and secretion ( Fig. 2A). In contrast to WT GCase, all three designs were secreted, and their enzymatic activity increased with the number of mutations. Additionally, dGCase3 exhibited activity when expressed and purified in Escherichia coli. By contrast, WT GCase did not display any enzymatic activity even though it could be expressed in E. coli (Fig. 2B). We concluded that increasing GCase stability led to correct protein folding independent of glycosylation (which does not occur in E. coli).
Design dGCase3 carries 55 amino acid mutations compared with WT GCase. The mutations are distributed across the entire protein, except for the active site which was restricted from the design calculations (Fig. 2C,E). To approximate the impact of the mutations on protein structure, ALPHAFOLD2 [26,27] was used to predict the structure of dGCase3. Alignment with the crystal structures of human GCase (PDB: 3gxi and 1ogs) [25,28] revealed very close agreement (< 0.5 A root mean square deviation; Fig. 3). This modeling suggests that no major structural rearrangements occur in dGCase3, in agreement with the fact that it retains catalytic activity.
Purified dGCase3 was assessed for in vitro enzyme activity and thermal stability and compared to purified recombinant GCase (r-GCase, purchased from Biotest) and to Cerezyme Ò . Cerezyme Ò and dGCase3 displayed very similar kinetic properties. By contrast, r-GCase exhibited an~5-fold lower k cat /K M , mainly due to a lower reaction rate (k cat ; Table 1). The melting temperature (T m ) of dGCase3 was 67.9 AE 0.7°C, which is~17°C and~12°C higher than Cerezyme Ò and r-GCase, respectively (Table 1, Fig. 2D). dGCase3 exhibits almost the same K M values as Cerezyme Ò , confirming that the active site is intact despite the 55 designed mutations.
One of the designed mutations, N370D, which is present in all of the PROSS designs, impacts the same position as one of the most common GD-causing mutations, N370S [29]. Whereas the clinical mutation to Ser results in low enzymatic activity, the Asp mutation maintains both expression and activity levels comparable to those of WT GCase (Fig. 4). This observation demonstrates that even positions associated with disease-causing mutations can be optimized using a judicious choice of mutation.
Expression of dGCase3 using AAV exhibits high GCase activity Two therapeutic regimes are currently attracting significant interest for neurological forms of LSDs, namely the use of small compounds for SRT that cross the blood-brain barrier [30], and gene therapy using vectors injected directly into the brain [10]. Since the immune response is attenuated in the central nervous system [31], designed proteins could in principle be used in the brain with minimal risk of an immunological response. Therefore, human WT GCase and dGCase3 were cloned into an AAVrh10 (adenoassociated virus, serotype rh10) vector and used to transduce GBA À/À neuroblastoma cells in culture.
Nondifferentiated SH-SY5Y GBA À/À cells (Fig. 5) displayed increased GCase activity upon transduction with both vectors in a dose-dependent manner. The activity of cells transduced with AAV-dGCase3 was two-threefold higher than that of those transduced with AAV-WT GCase (Fig. 5A). The AAV-dGCase3transduced cells [5 9 10 5 vg per cell (viral genome per cell)] exhibited the same levels of GCase activity as SH-SY5Y GBA +/+ cells (Fig. 5A). Likewise, when SH-SY5Y GBA À/À cells were differentiated (to allow them to survive longer in culture) and transduced with 5 9 10 5 vg per cell of the AAV vectors, the AAV-dGCase3 transduced cells exhibited significantly higher activity 12 and 15 days post-transduction than cells transduced with AAV-WT GCase (Fig. 5B). SH-SY5Y GBA +/+ cells contained~400 pmolÁmg À1 protein of GlcCer, with GlcCer levels elevated~20-fold (~7000 pmolÁmg À1 protein) in SH-SY5Y GBA À/À cells. Fifteen days post-transduction, AAV-WT GCase transduced GBA À/À cells showed reduction of GlcCer levels to~950 pmolÁmg À1 . An even more effective reduction was obtained upon transduction with AAV-dGCase3 (~450 pmolÁmg À1 ), decreasing GlcCer levels close to those of WT cells ( Fig. 5C; Table 2 gives levels of individual GlcCer species with different N-acyl chain lengths). Similar results were obtained for the extent of reduction of GlcSph, with GlcSph levels~125 and 20 times lower following AAV-dGCase3 and AAV-WT GCase transduction compared with GBA À/À cells   Table 2). Together, our results suggest that dGCase3 may be a suitable candidate for nGD gene therapy since it is more active in cell culture and clears more GlcCer than the WT enzyme, when using the same dose of AAV.
A machine learning classifier predicts the severity of GBA1 mutations.  ARP CI P KSF GY SSVVCVCNAT Y CDSF DP P T F P AL GT F SRY EST RSGRRME P ARP CI P KSF GY SSVVCVCNAT Y CDSF DP P T F P A GT F SRY EST RSGRRME P ARP CI P KSF GY SSVVCVCNAT Y CDSF DP P T F P A GT F SRY EST RSGRRME P K ARP CI P KSF GY SSVVCVCNAT Y CDSF DP P T F P A GT F SRY EST RSG RME  T  D  V  I  I  P  I  E  A  A  E  I  A  PI I VD TKD F YKQPMF YH GHF SKF I PEG QRVGL VASQKNDL DAVAL M  A  E  M  A  S  PI I VD TKD F YKQPMF YH GHF SKF I PEG QRVGL ASQKNDL DAVAL M   10  20  30  40  50  60  70  80  90  100   110  120  130  140  150  160  170  180  190  200   210  220  230  240  250  260  270  280  290  300   310  320  330  340  350  360  370  380  390  400   410  420  430  440  450  460  470  480  stabilizing mutations that it predicts, could augment the limited clinical data on benign mutations and lead to improved discrimination of disease-causing mutations. As a striking example for the paucity of clinical data on benign mutations, only three missense mutations in GBA1 have been classified to date as benign or likely benign (https://www.ncbi.nlm.nih.gov/ variation/view/). The 226 GD-causing missense mutations (Table S1) and the 55 PROSS-designed mutations in dGCase3 (Table S2) were combined to train an algorithm to predict the clinical effect of unknown GBA1 SNPs (Table S3). The analysis is based on the premise that PROSS mutations are individually neutral or stabilizing and do not impact enzyme activity. Three parameters were calculated for each mutation: (a) the change in conservation score between the WT amino acid and the mutated amino acid (DPSSM; based on the position-specific scoring matrix computed by PROSS); (b) the change in protein energy due to the mutation (DDG; also computed by PROSS); and (c) the exposure of the amino acid position to solvent (calculated by the Stride webserver). The best separation between mutations introduced by PROSS and the disease-causing mutations was obtained using DPSSM, followed by DDG (Fig. 6A). Solvent exposure did not show a significant separation between the GD-causing   and PROSS mutations. Next, we used DPSSM and DDG to train a linear support-vector machine to predict whether a particular GBA1 mutation is likely benign or deleterious (Fig. 6B). Out of 281 mutations (226 GD-causing and 55 PROSS), only five mutations were misclassified (A476D, F216Y, H255Q, H451R, and S345F), with three of them very close to the separation line (more details about individual mutations are provided in Table S1).
A set of SNPs in GBA1 has been documented (Table S3), although none of them have been detected in GD patients to date. Using the trained PRAMP (PRoss-based Algorithm for Mutation Prediction) classifier, we analyzed this set and separated the SNPs into putatively deleterious and benign mutations (Fig. 6C). In addition, each SNP was assigned a score (PRAMP score), determined by its distance from the separation line, with benign and harmful SNPs assigned a positive and negative score, respectively. Twenty-eight clones of GCase, bearing individual SNPs spanning the PRAMP score range, were expressed in HEK293T GBA À/À cells, and their in vitro activity was determined (Fig. 6D, Table S3). A clear correlation between the PRAMP score and GCase activity was obtained, as seen by the Spearman coefficient of 0.8. Thus, even though GD can be caused by factors other than defective enzymatic activity (such as defective lysosomal trafficking [32]), the PRAMP score developed herein gives a remarkably good correlation with enzymatic activity.
Many other prediction tools have been developed to attempt to distinguish between disease-causing and benign mutations [20]. We compared the results of our algorithm to REVEL (https://sites.google.com/site/ revelgenomics/), a missense mutation classifier that is based on an ensemble of 13 individual in silico tools [33]. Precomputed REVEL scores, in a range of 0 to 1 with 1 being the most severe, for the same GBA1 SNP missense mutations also correlated with enzymatic activity (Fig. 6E), but with a somewhat lower Spearman coefficient (À0.7) than obtained for PRAMP. Together, these results demonstrate that the PRAMP algorithm can accurately classify missense mutations. The observation that our stability-based analysis is at least as powerful as the much more sophisticated scheme employed by REVEL, highlights the importance of stability and expression in understanding the effect of mutations in GBA1. Data are means AE SD from at least three independent experiments. Statistical significance was determined using the Student's t-test. *P < 0.05, **P < 0.01, ***P < 0.005. Further data relating to lipid levels are given in Table 2.

3389
The

Prediction of clinical phenotypes using the PRAMP score
Nearly 300 GD-causing mutations have been documented in GBA1, including~230 missense mutations (Table S1 and [34]). Limited genotype-phenotype correlation is available, with a few exceptions. Thus, homozygosity for N370S always results in type 1 GD [35] and homozygosity for L444P invariably leads to nGD [36], although there is significant clinical variation even among patients homozygous for these wellcharacterized mutations. Predicting disease course is particularly problematic in compound heterozygotes. We attempted to predict the clinical severity of known GD mutations using the PRAMP score. The scores of the GD-causing mutations (Table S1) have a median value of À2.49 with some as low as À7.5. As shown previously, mutations with lower scores have lower enzyme activity and likely correspond to mutations that cause a more severe form of the disease. Indeed, a clear trend of a decreased PRAMP score correlating with increased disease severity is observed (Fig. 7A, Table 3). In particular, homozygous mutations [34] causing type 1 GD have a significantly higher PRAMP score than GD type 2 (P < 0.01; Fig. 7A). DPSSM was the most important parameter for separating disease-causing from benign mutations (Fig. 6A). Even so, mild N370S and severe L444P mutations have the same DPSSM (7), and their distinct severity is reflected in DDG values (DDG (N370S) = À11, DDG (L444P) = 24). Our atomistic calculations performed with PROSS are consistent with studies showing that GCase with an N370S mutation gives a stable protein with reduced enzyme activity [37], whereas the L444P mutation leads to protein structure destabilization which results in ER-assisted degradation [38,39]. We next tested whether a similar approach could be used to predict the clinical severity exhibited by patients with different mutations in each allele, that is, compound heterozygotes. Normally, N370S in one allele, even if the second mutation is a more severe mutation (e.g., L444P), results in a disease closer to that observed with homozygous N370S than with homozygous L444P, suggesting that N370S can protect against the more severe (neurological) disease associated with L444P [36]. This being the case, a geometric rather than an arithmetic average was used to calculate the score of compound heterozygotes, since it is weighted in favor of the milder allele (lower score). PRAMP scores for compound heterozygous and the few homozygous mutations for which clinical data are available were taken from [40,41]. As for homozygous mutations, the PRAMP score for compound heterozygous mutations also decreases with disease severity, yielding significant differences between mutations related to GD type 1 and GD type 2 (P < 0.005) and GD types 1 and 3 (P < 0.05; Fig. 7C). The same mutation sets were also assessed using REVEL (Fig. 7B,D). Although a trend of higher REVEL scores, that is, classified as more harmful, was observed for mutations related to more severe disease, the only significantly distinct REVEL scores were obtained for GD type 2 and GD type 1 for compound heterozygous mutations (P < 0.05). Moreover, a correlation between the age of disease onset and the mutation score was observed ( Table 4) yielding Spearman coefficients of 0.94 and À0.77 for PRAMP and REVEL scores, respectively. Taken together, our results indicate that the clinical outcome of GBA1 mutations can be predicted to a large extent by the impact of the mutation on protein stability and expression. Comparison of the outcome of the PRAMP algorithm with the REVEL classifier showed similar trends, but better performance of our prediction algorithm documented by higher correlation coefficients and significantly distinct PRAMP scores between the individual GD types.

Discussion
Our study makes two important contributions based on stability-design calculations. First, a GCase design comprising 55 mutations exhibits several potential advantages relative to the WT human enzyme for possible use in gene therapy since the design exhibits higher in vitro GCase activity and better performance upon AAV transduction in terms of enzymatic activity and GlcCer/ GlcSph clearance. Second, by assuming that all PROSSdesigned mutations are benign, we augmented clinical data to generate a classifier of the effect of mutations. The PRAMP classifier correctly predicted the functional characteristics of SNPs that have not been assigned disease status and demonstrated promise in predicting disease severity. Taken together, these results suggest that this simple predictor may provide a novel diagnostic tool, although clearly other factors, such as genetic and environmental factors, are also likely to play a role in disease severity.
In terms of gene therapy, dGCase variants could be expressed via viral vectors, which is an attractive approach for overcoming the neurological symptoms in nGD patients [10]. While the higher stability is likely to be of great advantage, the finding that dGCase3 has similar kinetic parameters to Cerezyme Ò , along with its ability to clear GlcCer better than WT GCase, suggests that using the designed GCase may indeed be of great value in gene therapy approaches. Vector dose-dependent immune responses and toxicity have been observed in several gene therapy trials. The optimization of transgene product levels and activity may permit reduction of the vector dose required to achieve therapeutic efficacy [42]. Our results also suggest that stability design can be successfully applied to other proteins that cause LSDs. Although LSDs are individually rare, taken together they are found in~1 : 5000 individuals [43], and most LSDs are caused by missense mutations, as in the case of GD. While our stability-based PRAMP classifier is attractive, other factors need to be considered when predicting disease severity, such as genetic background [44] and environmental factors [45]. For instance, patients homozygous for L444P present with a quite different clinical course depending on their genetic background [46], even though all have nGD. This suggests that the PRAMP score could be used to predict the type of GD, that is, type 1, 2, or 3, but may need to be combined with other factors in order to distinguish subtle differences in the clinical course of each type of disease in individual patients. One area that has not yet received attention is patients who are compound heterozygotes.
The PRAMP score predicted for compound heterozygotes in the current study gave a reasonable fit with the relatively limited clinical data available, supporting the possibility that the PRAMP score could be used to predict the clinical course of GD in compound heterozygotes. Such predictions could guide treatment regimes.
The ability to predict disease severity based on a classifier that distinguishes between stabilizing and disease-causing mutations not only paves the way to redefining genotype-phenotype correlations but also has exciting implications for understanding protein structure and function. This is exemplified in the mutations found at N370, with N370S the most common mutation leading to type 1 GD, whereas N370D is a stabilizing mutation identified by PROSS. Remarkably, 30 of the mutations in dGCase3 impact positions in which disease-causing mutations or predicted disease-causing SNPs have been identified. Thus, Fig. 6. SNP classification predicts functional outcome. Changes introduced into the GCase sequence by PROSS, together with GD-causing mutations, were used to construct a PRAMP algorithm to identify harmful and nonharmful mutations. (A) Separation of PROSS (blue) and GD (red) mutations according to one of the calculated parameters. Histograms of DPSSM, DDG, and solvent exposure fraction were calculated for each mutation. (B) Classification of the mutation training sets (PROSS, blue, and GD-causing, red) by PRAMP according to their DPSSM and DDG scores (DPSSM and DDG scores were normalized); the separation line is depicted in black. The y-axis is in Rosetta energy units. (C) Prediction of the algorithm using the SNP data (black dots; see Table S3). The separation line is depicted in black; the fields with potentially harmful and benign mutations are colored in orange and blue, respectively. The y-axis is in Rosetta energy units. (D, E) Enzymatic activity of WT GCase, together with that of the individual GCase point missense mutants expressed in HEK293T GBA À/À cells. GCase activity was assayed on cell homogenates. Mutations with positive (benign) and negative (harmful) PRAMP scores are shown in blue and orange, respectively. The dashed lines indicate the level of enzyme activity of WT GCase. GCase activity was correlated with (D) PRAMP scores or to (E) REVEL scores. Values of the PRAMP score decrease with mutation severity whereas values of the REVEL score increase with disease severity. Data are means AE SD, n = 3. Further details are given in Table S3.  Tables S4 and S5. Statistical significance was calculated by ANOVA and post hoc Tukey's test, *P < 0.05, **P < 0.01, ***P < 0.005.

3393
The
Protein expression in E. coli WT GCase and dGCase3 were cloned into a pET28-bdSUMO [48] vector (Fig. 1A) containing an N-terminal His-tag for purification and expressed in E. coli. Cells were Table 4. Correlation of PRAMP score with the age of onset of GD. Genotypes and the average age of onset (years) (from Ref. [40]) are shown along with the PRAMP and REVEL scores. Spearman coefficients are 0.94 and À0.77 for PRAMP and REVEL scores, respectively. This mutation is usually classified as mild, but it has been found in several patients with type 3 GD [36,59]. grown at 30°C until OD 600 reached 0.6-0.8, followed by induction of protein expression (by 200 lM IPTG) at 15°C for~18 h. Proteins were isolated from E. coli lysates using Ni 2+ chelating chromatography followed by release from the column using SUMO protease [49]. Protein purity was assessed on 10% Tris-glycine SDS/PAGE gels stained with Coomassie blue. GCase variants were identified by western blotting using anti-His antibodies.

Protein expression in HEK293T cells
Genes coding for WT GCase and dGCase1, dGCase2, and dGCase3 were cloned into a pcDNA 3.1 vector, together with an N-terminal Twin-Strep isolation tag (Fig. 1B). Proteins were targeted extracellularly using the N-terminal R-PTP-S secretion signal (MGILPSPGMPALLSLVSLLSVLLMGCVA) [50]. For protein expression, HEK293T cells were grown in 10 cm culture dishes and transiently transfected using the polyethyleneimine reagent with 10 lg of plasmid per dish (DNA : PEI ratio was 1 : 1.5 w/w). Growth media were collected 36-48-h posttransfection.

Purification of WT GCase and dGCase from the extracellular medium
GCase was isolated from the medium using a Strep-Tactin Ò XT 4Flow high-capacity resin. Medium was transferred to 250 mL tubes and centrifuged at 10 000 g (4°C, 20 min) to remove detached and dead cells. The medium was then transferred into 50 mL Falcon tubes and 200 lL of affinity resin suspension in Tris buffer (150 mM NaCl/ 50 mM Tris, pH 7.4) was added. Tubes were placed on a rotator at 4°C overnight. The tubes were then centrifuged (4000 g, 4°C, 20 min) and the medium aspirated. The resin was washed with an excess of Tris buffer by three centrifugation steps (4000 g, 4°C, 20 min). GCase was released from the Strep-TactinXT resin using five consecutive elution steps with 50 mM biotin. Biotin was dissolved in sodium citrate buffer (40 mM trisodium citrate, 15 mM disodium hydrogen citrate, 187 mM D-mannitol, and 0.1% (v/ v) mL Tween 80, pH 6.1). The eluted protein was stored in sodium citrate buffer. Protein purity was assessed on 10% Tris-glycine SDS/PAGE gels stained with Coomassie blue. GCase was identified by western blotting using anti-GCase and anti-StrepMAB antibodies. Protein preparations were assayed for enzymatic activity and subjected to differential scanning fluorimetry. The kinetic and spectroscopic data were compared with the corresponding data for Cerezyme Ò and for recombinant WT GCase (r-GCase).

Differential scanning fluorimetry
Differential scanning fluorimetry (DSF) was performed using a NanoDSF Prometheus NT.48 instrument (NanoTemper Technologies GmbH, Munich, Germany). Samples were heated at 1°CÁmin À1 steps over a 20-95°C temperature range. The fluorescence emission of tyrosine and tryptophan was recorded at 330 and 350 nm, respectively. Data were analyzed using a PR.THERMCONTROL v2.1.1 instrument (Nano-Temper Technologies GmbH). The melting temperature (T m ) was defined as the inflection point of the fluorescence intensity (FI) ratio curve, where R (FI) = FI 350nm /FI 330nm .

Enzyme activity
Enzymatic activity was determined using a fluorescently labeled substrate of GCase, C6-NBD-GlcCer, as described [51,52]. The assay was performed using 0.1 lg of pure protein or 7 lg of cell homogenate in a final volume of 20 lL McIlvarine buffer, pH 4.2. The reaction was run at 37°C for 5 min and terminated by the addition of 1.5 mL of chloroform-methanol (1 : 2, v/v) prior to lipid extraction. For the kinetic study, the activity of purified GCase was determined using p-nitrophenyl-b-D-glucopyranoside [37]. An aliquot of the enzyme was incubated with 0.2-4 mM p-NP-Glc, pH 5.9, at 25°C, for 60 min. The reaction was terminated by 50-fold dilution into 1 M glycine buffer, pH 10.0, and absorbance of the p-nitrophenol was measured at 405 nm using an Agilent Cary 3500 spectrophotometer (Agilent Technologies, Santa Clara, CA, USA). Data were analyzed using the Michaelis-Menten eq. SH-SY5Y and HEK293T GBA À/À cells HEK293T GBA À/À and SH-SY5Y GBA À/À cells were produced by Crispr/Cas9 genome editing [53]. A guide sequence (CATAGCGGCTGAAGGTACCA) was chosen to optimize for both off-and on-targeting using the MIT CRISPR design tool [54], and the sgRNAdesigner Rule set 1 [55], respectively. The guide sequence was cloned into a pSpCas9(BB)-2A-GFP vector and transfected into cells. Isolation of clonal cell populations was performed 24 h after transfection by FACS sorting. Single cells were sorted using GFP fluorescence, into 96-well plates containing 100 lL medium in each well. After 1-3 weeks, viable colonies were transferred to 24-well plates and collected for verification of GBA1 knock-out by western blotting. Endogenous GCase activity and GlcCer levels were determined in cell homogenates (Fig. S1).

AAV vector preparation
Vectors were generated at the translational vector core (CPV) of the University Hospital of Nantes by packaging AAV2-based recombinant genomes containing DNA sequences encoding WT GCase or dGCase3 under the control of a ubiquitous CAG promoter (Fig. 1C) into AAVrh10 capsids using helper virus-free transfection of HEK293 cells. The vectors were purified using an optimized CsCl gradient-based purification protocol [56]. Viral protein purity and identity were verified by SDS/PAGE silver staining, and vector titers quantified by qPCR with primers targeting the flanking sequence of ITR2.

Transduction of SH-SY5Y cells using AAV
Nondifferentiated SH-SY5Y cells were seeded in 6-well plates (300 000 cells per well in 2 mL culture medium) (Day 1). On Day 2, 0.5 mL of medium was replaced by the same volume of transduction medium containing the vector at 0.5 9 10 5 , 1 9 10 5 , and 5 9 10 5 vg per cell. On Day 3, 0.5 mL of fresh cell culture medium was added. Cells were collected on Day 5 using trypsin/EDTA and pelleted by centrifugation (5 min, 1000 g, 4°C). Pellets were used immediately or stored at À80°C.

Cell transfection using GCase missense mutants
Single-point missense mutations were introduced into the WT GCase sequence in pcDNA 3.1 plasmids. HEK 293T GBA À/À cells were cultured in 6-well plates and transfected using the polyethyleneimine reagent using 2 lg of plasmid per well. Cells were collected 36-48-h post-transfection. Enzymatic activity was measured as described in the previous section.

Lipidomic analysis
Cell homogenates were prepared as described in previous sections except that cell pellets were lysed in double-distilled water with a protease inhibitor cocktail (1 : 100). Quantitative analysis of sphingolipids in cell homogenates was performed by liquid chromatography-tandem mass spectrometry [58].

PRAMP algorithm
All mutations used are listed in Supplementary Dataset and Tables S1 (GD-causing), S2 (PROSS) and S3 (SNPs). A comprehensive list of GD-causing missense mutations was created via literature review. To generate the list of SNPs (that have not been detected in GD patients), variants of GBA1 were downloaded from the NCBI Variation Viewer (https://www.ncbi.nlm.nih.gov/variation/view/) human genome version GRCh38.12. The list was filtered prior to download with the molecular consequence 'missense variant'. The list was manually annotated to ensure that all protein-coding changes were included in the dataset and duplicates or synonymous mutations had been removed. Mutations without documented clinical significance and with uncertain significance were chosen.
DPSSM was calculated by subtracting the PSSM score of the mutated amino acid from that of human GCase. The PSSM table was extracted from the PROSS stability-design calculations, as were the DDG calculations. The solvent exposure fraction was calculated using the STRIDE webserver (http://webclu.bio.wzw.tum.de/stride/). All the parameters can be found in the Supplementary dataset (10.17632/ pkcjn539b5.1). The PRAMP algorithm was built by a custom-written Python script using LinearSVC function (scikit-learn).
REVEL scores were downloaded from the website: https://sites.google.com/site/revelgenomics/downloads and parsed manually to create a file with only the GBA locus for comparison to PRAMP.
The PRAMP score for compound heterozygous mutations was calculated as the negative geometric average of the two individual PRAMP scores, and the REVEL score for compound heterozygous was calculated as geometric average of the two individual REVEL scores. Statistical significance was evaluated either by the Student's t-test or by analysis of variance (ANOVA) followed by post hoc pairwise comparisons using the Tukey's honest significant difference test. Correlations were evaluated using the Spearman coefficient.
for Table S1 and Chen Yaacobi for creating the GBA À/À cell lines. Research in the Futerman laboratory was supported by a Sponsored Research Agreement between the Weizmann Institute of Science (via Yeda, its technology transfer office) and Lysogene. Research in the Fleishman laboratory was supported by a Consolidator Award from the European Research Council (815379), the Israel Science Foundation (1844), the Dr Barry Sherman Institute for Medicinal Chemistry and by a charitable donation in memory of Sam Switzer. SP was partially supported by the Czech Academy of Sciences (Czech/Israel scientific program). AHF is the Joseph Meyerhoff Professor of Biochemistry at the Weizmann Institute of Science.
Author contributions JLS, IS, SJF, and AHF designed the research. SP, RK, YA, YP, TU, SA, OD, AT, and RT performed the biochemical experiments. OK, RL-S, and AG performed the computational work. MH and RL contributed new reagents. SP, YA, OK, RL-S, and SBD analyzed data. SP wrote the manuscript. AHF wrote the manuscript and obtained funding. All authors discussed data and edited the manuscript.

Conflict of interest
Yeda Research & Development, on behalf of the Weizmann Institute of Science, has applied for patent applications corresponding to PCT/IL2021/050357 on the acid-b-glucosidase designs, naming AHF, IS, JLS, SJF, AG, SP, YA, and OK as inventors. SJF and AG are named inventors on stability-design patents corresponding to PCT/IL2016/050812. MH and RL are employees and shareholders of Lysogene.

Peer review
The peer review history for this article is available at https://publons.com/publon/10.1111/febs.16758.

Data availability statement
The experimental data that support the findings of this study are included in this article and supplementary material ( Fig. S1 and Tables S1-S5). The structural data used within this study are openly available in the wwPDB (PDB 10.2210/pdb3GXI/pdb). The data used for the construction of the PRAMP algorithm are openly available in Mendeley data at 10.17632/ pkcjn539b5.1. The list of single-nucleotide polymorphism data is openly available at NCBI Variation Viewer at https://www.ncbi.nlm.nih.gov/variation/ view/.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Characterization of HEK293T and SH-SY5Y GBA À/À cells. Table S1. Comprehensive list of missense mutations which cause Gaucher disease. Table S2. PROSS mutations and their PRAMP scores. Table S3. List of SNPs in GCase which have not been shown to cause Gaucher disease. Table S4. Homozygous GD mutations used to generate Fig. 7A,B. Table S5. Compound heterozygous and homozygous GD mutations used to generate Fig. 7C,D.