QTY code designed antibodies for aggregation prevention: A structural bioinformatic and computational study

Therapeutic monoclonal antibodies are the most rapidly growing class of molecular medicine, and they are beneficial to the treatment of a broad spectrum of human diseases. However, the aggregation of antibodies during the process of manufacture, distribution, and storage poses significant challenges, potentially compromising efficacy and inducing adverse immune responses. We previously conceived a QTY (glutamine, threonine, tyrosine) code, a simple tool for enhancing protein water‐solubility by systematically pairwise replacing hydrophobic residues L (leucine), V (valine)/I (isoleucine), and F (phenylalanine). The QTY code offers a promising alternative to traditional methods of controlling aggregation in integral transmembrane proteins. In this study, we designed variants of four antibodies applying the QTY code, changing only the β‐sheets. Through the structure‐based aggregation analysis, we found that these QTY antibody variants demonstrated significantly decreased aggregation propensity compared to their wild‐type counter parts. Our results of molecular dynamics simulations showed that the design by QTY code is capable of maintaining the antigen‐binding affinity and structural stability. Our structural informatic and computational study suggests that the QTY code offers a significant potential in mitigating antibody aggregation.


| INTRODUCTION
More than 100 antibody-based therapeutics have been approved and counting for the treatment of a diverse array of human diseases up to 2022, and this number is still growing rapidly in recent years. 1 Monoclonal antibodies (mAbs) have been widely employed to treat autoimmune diseases, 2,3 chronic inflammatory diseases, 4 various types of cancers, 5,6 ophthalmic diseases, 7 and more.Representative examples include: Infliximab, a mAb against Tumor Necrosis Factor-alpha (TNFα), extensively employed in the treatment of rheumatoid arthritis 8 ; Trastuzumab, targeting the human epidermal growth factor receptor 2 (HER2), widely utilized in the management of early-stage breast cancer 6 ; Ranibizumab, targeting vascular endothelial growth factor (VEGF-A), widely employed in the treatment of neovascular agerelated macular degeneration. 7One of the most challenging issues in the manufacture, distribution, and storage of mAbs lies in the potential of mAb aggregation, 9 which significantly limits the shelf-life of the mAbs in the point of care.Aggregation can not only compromise their efficacy, but also induce adverse immune response, even cause severe hypersensitivity responses, 10 such as anaphylaxis. 11 conventional solution to this problem is to optimize the environmental factors that affect the aggregation rate in the process of manufacture, formulation and storage of mAbs, 10 such as temperature, protein concentration, pH, oxygen, shear forces, and the ionic strength.[12][13][14] However, these methods tend to be costly, increasing complications for medical delivery and time-consuming.15 An alternative approach is to redesign the aggregation-prone hydrophobic patches to reduce the aggregation propensity of the antibodies.16 Numerous studies have demonstrated the effectiveness of this approach in reducing aggregation propensity while maintaining the bioactivity.15,[17][18][19][20] Nonetheless, such methodology is case-by-case and dependent on the available structure inputs, such as crystal and CryoEM structures.There is no simple and universal method that can address the aggregation of diverse antibodies.
We previously reported a simple and general tool for enhancing protein water-solubility, QTY (glutamine, threonine, tyrosine) code for directly converting hydrophobic α-helices to hydrophilic α-helices in several integral membrane protein chemokine receptors. 21QTY code is based on two key molecular facts: (1) Several amino acids share strikingly similar structures: Leu (L) versus Gln (Q)/Asn (N), Ile (I)/Val (V) versus Thr (T) and Phe (F) versus Tyr (Y), despite the stark contrast between the hydrophobicity of L, I, V, and F and the hydrophilicity of Q, T, and Y. (2) The secondary structure propensity of Q, T, and Y is close to that of L, I/V, and F, respectively.2][23][24][25] The water-solubilization design by QTY code produced detergent-free transmembrane receptors that retained ligand-binding affinity [21][22][23]25 and high thermostability. Strural informatic studies using the highly accurate machine learning-based structure prediction approach Alpha-Fold2 26 (AF2) showed the significant superpositions between the native proteins and their QTY variants.[27][28][29][30] This general QTY code could have far-reaching implications for designing protein watersolubility, opening the door to designing proteins for a wide range of applications for pharmaceutically therapeutic mAbs, protein, and peptide-based biologics that can treat a wide range of diseases.
The molecular structures of antibodies predominantly comprise of β-sheets, typically more than 50% of the entire protein sequence. 31e inherent inclination of β-strands or sheets to form unfavorable inter-molecular interactions can often result in the destabilization of soluble structures and the formation of aggregates without a precisely controlled register. 32 ask if the QTY code could also be applied to β-sheets since the β-sheet propensity of Q, T, and Y is also close to that of L, I/V, and F, respectively. 33Therefore, it is promising that we can design antibody variants using the QTY code to replace hydrophobic residues in β-sheets of the antibody structure with hydrophilic ones, thus reducing their propensity for aggregation.
Here, we report study of the designed QTY variants changing only the β-sheets of four antibodies with available full-length crystal structures: (i) IgG1 b12, 34 (ii) Pembrolizumab, 35 (iii) anti-NPRA IgG4, 36 and (iv) Mab 61.1.3. 37We generated the full-length AF2 structure models of QTY variants of the four antibodies.Their structures were all superposed well with the wild-type antibodies, for both antigen binding fragment (Fab) regions and crystallizable fragment (Fc) regions.We then evaluated the aggregation propensity of the wild-type and QTY antibodies by analyzing the three-dimensional structures and found that the QTY variants showed significantly decreased aggregation propensity, compared to the wild-type antibodies.Using molecular dynamics simulations, we showed that the QTY variant IgG1 b12 QTY interacted with the antigen with a close binding affinity in a similar manner, compared to wild-type IgG1 b12.
We observed that the interactions of the polar residues introduced by QTY code mimicked the packing of the original hydrophobic residues in the β-sheets.Our study suggests the great potential of QTY code in mitigating aggregation of therapeutic antibodies.

| Design of QTY variants and characteristics calculation
The sequences of the selected antibodies were from PDB https:// www.rcsb.org.The PDB entry numbers are listed in Table 1 and Table S1.The secondary structure information was extracted from the PDB files by PyMOL (version 2.0 Schrödinger, LLC).All the residues comprising β-sheets were regarded as the replacements of QTY design.The molecular weight and pI values of the proteins (combining the heavy and light chains) were calculated by the web server Expasy https://web.expasy.org/compute_pi/.

| Structure prediction by AF2 and structural analysis
The structure prediction of all proteins was performed using AF2 via the ColabFold 38 pipeline applying mostly default parameters (num_relax = 5, template mode: none, msa_mode: MMSeq2, max_recycles = 12).The resulting predicted structures with the highest model rank were used for subsequent analysis.PyMOL was used for structural visualization and analysis.The structures of the wildtype antibodies and their QTY variants were superposed by PyMOL "align" command, of which the cycle number was chosen when the objected atom number was the lowest.The monomeric Fab and Fc regions were aligned, separately, due to the flexibility of the hinge region.

| Aggregation propensity evaluation and visualization
The structure-based prediction of protein aggregation web server, Aggrescan3D (A3D) version 2.0 39 (http://biocomp.chem.uw.edu.pl/A3D2/), was used to evaluate the aggregation propensity and identify aggregation-prone regions of the antibodies.The monomeric Fab regions and the dimeric Fc regions (without glycans) were input and analyzed, separately.For the settings, stability calculations: yes; dynamic mode: yes; mutate residues: no; distance of aggregation analysis: 10 Å.For calculation of the average A3D score of each protein, the scores of all 13 models (the dynamic models and the static model) were used.For Fc regions, the sites interacting with glycans were excluded for the average score calculation.P values were calculated by the unpaired two-tailed Student's t-test in the software Graphpad Prism (https://www.graphpad.com).For visualization of the aggregation-prone patches, the representative structure snapshots were directly exported from the A3D server.

| MD simulations and analysis
The simulations were performed using GROMACS 2022.3 (https:// manual.gromacs.org/2022.3/).For wild-type IgG1 b12-gp120 complex, the available complex crystal structure (PDB: 2NY7) was used as the input structure.For IgG1 b12 QTY -gp120 complex, we first aligned the AF2 structure of IgG1 b12 QTY (Fab region) with IgG1 b12 of 2NY7, then manually removed some residue clashes by adjusting the side chain orientation in PyMOL and exported the IgG1 b12 QTY complexed with gp120 as the input structure.The topology files were established through Leap program of AmberTools22. 40For the settings, ff14SB Amber force field 41 and TIP3P water model were selected.To neutralize the system and enable the salt concentration of 150 mM, 141 Cl À ions and 128 Na + ions were added.The distance between the protein and the edge of the box was set to 12 Å.The system energy was minimized using the steepest descent method and converged to 1000 kJ/mol/nm.Electrostatics were treated with Particle Mesh Ewald, and the cutoff for both Coulomb and van der Waals interactions was 1.2 nm; 2 fs time step were used during the equilibration stage.The modified Berendsen thermostat was used, with system coupled to a 300 K bath.The Parrinello-Rahman barostat was used with isotropic coupling.Hydrogen bonds were constrained using LINCS algorithm.Finally, the production run of 50 ns was performed.

| Design of QTY variants of antibodies
We selected four antibodies as the subjects of design (Table 1), whose intact crystal structures are available, including both Fab regions and Fc regions.In this study, we only focus on applying the QTY code for the design of β-sheets since there are high content of β-sheets in all antibodies.For the residues comprising β-sheets in each antibody, because the striking structural similarity of the pairwise amino acids, namely, L:Q, I/V:T, F:Y, we replace all leucine (L) with glutamine (Q); isoleucine (I) and valine (V) with threonine (T); phenylalanine (F) with tyrosine (Y) (Figure 1, Figures S2-S5).We aligned the sequences of QTY variants with the wild-type antibodies.On average, about onethird residues in β-sheets were substituted and 14% $ 17% in total were substituted, similarly for heavy chains (HCs) and light chains (LCs).We preliminarily analyzed the basic characteristics of the designed QTY variants.Despite the high variation rate introduced by QTY code, each QTY variant shows close molecular weight (MW) and isoelectric point (pI) value with those of the wild-type antibody, which is determined by the rationale of QTY code.QTY code adopts the residues of the comparable size for substitution and dose not introduce any additional charge, so as to preserve the intrinsic properties of the proteins.

| Structure prediction of QTY variants and superposition with wild-type antibodies
In order to study the molecular structural properties of the QTY variants, we generated the structure models of full-length heterotetramers ($1300 amino acids) using AF2.By analyzing the predicted Local Distance Difference Test (pLDDT) data of all residues of the predicted models, Fab and Fc regions both showed a good prediction quality (Figure S1).The hinge regions showed poor quality, which was expected due to the intrinsic high flexibility. 43The AF2 models of the four QTY variants all showed the similar manner of assembly of heterotetramers to native antibodies (Figure S1).Then, we superposed the structures of the QTY variants with the crystal structures of the T A B L E 1 The basic information of the selected antibodies.corresponding wild-type antibodies.Due to the flexibility of hinge regions, we performed the superposition of Fab regions and Fc regions separately (Figure 2).All the four QTY variants showed good structural similarity with the wild-type antibodies, for both Fab and Fc regions.The root-mean-square deviation (RMSD) values of all superposition are below 2.6 Å (Table 2).These results imply that despite the significant substitutions of hydrophobic residues with hydrophilic residues, QTY code preserved the overall structures of the antibodies.We then analyzed the aggregation-prone regions of the structural models by A3D (Figure 4).Generally, by comparing the QTY variants with the wild-type antibodies, we observed the lower A3D scores for most residues, indicating the decreased overall aggregation propensity.
From these structural models, we observed a conserved aggregation-prone patch (in contrast to the individual aggregationprone residues) in the Fab regions shared by all the four antibodies, which was previously identified as a hydrophobic region, 17 or in sequence aspect as an aggregation-prone motif. 44This motif, residues were changed into polar resides ("ATQQSSGLY").This could be interpreted by the high hydrophobicity resulted from these two successive highly-exposed hydrophobic residues ("VL") and the effects of the surrounding water-solubilization tendency resulted from QTY design.
For the Fc region, the residues interacting with glycans were excluded for the average score calculation, because AF2 cannot predict the glycan structure.All the QTY variants showed significantly lower aggregation propensity compared with wild-type antibodies (three with p value <.001 and one <.01; Figure 5).For the analysis of aggregation-prone regions, similarly, we identified a conserved aggregation-prone patch, also previously identified as a hydrophobic region, 17 or in sequence aspect as an aggregation-prone motif, 44 "VLDSDGSFF" (or "IMDTDGSYF" in Mab 61.1.3),which was significantly eliminated by QTY design (Figure 6).In this motif, four hydrophobic residues were changed into polar resides (one in Mab 61.1.3).
It is worth noting that these two conserved aggregation-prone patches are not reported to be engaged in formation of common complexes with other proteins. 17Thus, the design of these regions should have negligible effects on the functionality of the antibodies.
Some therapeutic mAbs are explicitly reported to be aggregationprone.In order to show the potential of QTY code on ameliorating the aggregation of these antibodies, we selected another four therapeutic mAbs with available structures of the Fab region, Bevacizumab, 15 Rituximab, 45 Infliximab, 8 and Trastuzumab 46 (Table S1).Using the same QTY code protocol, we designed the QTY variants of these antibodies and generated their AF2 structure models.The aggregation analysis using A3D showed that QTY design remarkably mitigated the aggregation propensity (two with p value <.001 and the other two <.01; Figure 3E-H).It is observed that the Fab region average A3D scores of these four antibodies ($ À0.6) were overall lower than those of IgG1 b12, Pembrolizumab, anti-NPRA IgG4, and Mab 61.1.3($ À0.7), supporting the aggregation-prone nature of these antibodies and the QTY design to reduce the aggregation.
Our results suggest that QTY design is capable of ameliorating the aggregation propensity of the antibodies.

| Molecular dynamics simulations of the wildtype/QTY antibody-antigen complex
In order to examine the effects of QTY design on the functionality and structural stability of the antibody, we conducted 50-ns all-atom molecular dynamic (MD) simulations of the Fab region of b12/b12 QTY complexed with the HIV CD4-binding domain of gp120 ($730 amino acids in total), respectively.8][49] During the 50-ns simulation, both b12 and b12 QTY complex showed good structural stability (Figure 7A), with comparable backbone RMSD values, 1.72 To compare the binding affinity of b12 and b12 QTY with gp120, we used Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) algorithm 42 to estimate the binding free energy.During the 50-ns simulation, the binding free energy of b12 QTY -gp120 complex (À64.7 ± 15.2 kcal/mol) was very close to wild-type b12 (À64.3 ± 7.2 kcal/mol; Figure 7B).These values are in good agreement with the value in previous MD simulations of b12-gp120 complex. 49Next, we also analyzed the contributions of the individual residues to the binding on the wild-type/QTY b12-gp120 interface.
In b12-gp120 complex simulation, we found a region that contributed the most to the binding (Figure 7C), which was mainly composed of three bulky residues, Y98 and W100 in complementarity-determining region (CDR) H3 of b12, and R419 in gp120.These three residues were previous reported to be the main contributors to the binding. 49In b12 QTY -gp120 complex simulation, likewise, this region was also the predominant contributor to the binding (Figure 7D).
Previous experimental data of the antigen-binding activity test in a study of engineering antibodies against aggregation showed that the activity was lost in the variants with mutations in the CDR regions, whereas the activity was retained in the variants with mutations elsewhere. 17 We previously observed the formation of significant numbers of hydrogen-bonds (H-bonds) in the interior of the helical bundle of a QTY variant of α-helical membrane protein by MD simulations. 22ese H-bonds would not exist in the native hydrophobic core areas with L, I/V, and F hydrophobic contacts.In this study in the context of β-sheet design, we asked how the polar residues introduced by QTY design (simplified as "QTY residues") were engaged in the stabilization.We observed the interactions of these residues in b12 QTY mimicked the packing of the original hydrophobic residues in b12 (Figure 8A,B,E,F).In the cases shown in Figure 6, between the layers of β-sheets, the QTY residues tightly interacted with each other by forming inter-residue H-bonds, in similarity with the Van der Waals contact of the original hydrophobic residues (Figure 8C,D,G,H).
In summary, these results indicate that QTY design targeting β-sheets could preserve the antigen-binding activity and maintain the structural stability.

| CONCLUSIONS
Protein aggregation is the predominant problem during the storage of mAbs for therapeutic use. 17 designing the detergent-free, functional membrane proteins.In this study, our structural informatic and computational results showed that, compared to the wild-type antibodies, the antibody variants designed by QTY code exhibited lower aggregation propensity, with the comparable antigen-binding activity and good structural stability.
We believe that QTY design targeting β-sheets performed as well as targeting α-helices previously.Our study suggests that QTY code is a promising and useful tool for mitigating the aggregation of therapeutic mAbs, and hopefully for other aggregation-prone proteins.

3. 3 |
Aggregation propensity evaluation of wildtype/QTY antibodies To assess if and how our design might ameliorate the aggregation of the antibodies, we compared the aggregation propensity of the wild-type antibodies with their QTY variants by analyzing their threedimensional structures using the tool A3D 2.0.According to the specific conformational context of the individual residue, A3D calculates a structurally corrected aggregation value (A3D score).By averaging the A3D scores of all residues in the input structure, we obtained an "average A3D score" to indicate the aggregation propensity of this structure.Moreover, A3D can generate 12 dynamic models by its dynamic mode, which can conduct fast simulations of protein flexibility, sampling multiple conformations instead of only the static model.For each wild-type antibody and its QTY variant, we compared the average A3D scores, taking all 13 models (12 dynamic models and the static model) into account.For the Fab region, all the four antibodies showed significantly lower aggregation propensity after the water-solubilizing design by QTY code (QTY design) (Figure 3A-D), especially IgG1 b12 (b12) and Mab 61.1.3(with p values < .01).
Aggregation propensity of the Fab regions of the antibodies was lowered down after QTY design.For each protein, the average A3D scores of 13 structure models (the dynamic models and the static model) were calculated by A3D 2.0, indicating the aggregation propensity.Low A3D scores indicate low aggregation propensity.For example, the average A3D score of IgG1 b12 is around À0.7, which indicates a higher aggregation propensity compared to IgG1 b12 QTY , with the average A3D score around À0.8.(A-H) Wild-type (WT, in green) IgG1 b12, Pembrolizumab, anti-NPRA IgG4, Mab 61.1.3,Bevacizumab, Rituximab, Infliximab, Trastuzumab versus their QTY variants (in blue), respectively.Data are presented as the mean ± standard error of the mean (SEM).p values were calculated by the Student's t-test.*p < .05,**p < 0.01, ***p < .001.F I G U R E 4 Legend on next page.± 0.36 Å of b12, and 1.95 ± 0.29 Å of b12 QTY .The low standard deviation of RMSD in b12 QTY simulation also shows little significant structural deviation along the simulation.
Consistently, our QTY design changing only β-sheets circumvented these regions and minimized the effects on the functionality.F I G U R E 5 Aggregation propensity of the Fc regions of the antibodies was lowered down after QTY design.For each protein, the average A3D scores of 13 structure models (the dynamic models and the static model) were calculated by A3D 2.0, indicating the aggregation propensity (low A3D scores mean low aggregation propensity).For example, the average A3D score of IgG1 b12 is around À0.9, which indicates a higher aggregation propensity compared to IgG1 b12 QTY , with the average A3D score around À1.0.The residues interacting with glycans were excluded for the average score calculation.(A-D) Wild-type (WT, in green) IgG1 b12, Pembrolizumab, anti-NPRA IgG4, Mab 61.1.3versus their QTY variants (in blue), respectively.Data are presented as the mean ± SEM. p values were calculated by the Student's t-test.*p < .05,**p < .01,***p < .001.F I G U R E 4 Visualization and comparison of the aggregation-prone regions of the Fab regions of native antibodies and their QTY variants.The structure snapshots are from the output structure models by A3D 2.0 dynamic modes http://biocomp.chem.uw.edu.pl/A3D2/.(A-D) Wild-type IgG1 b12, Pembrolizumab, anti-NPRA IgG4, Mab 61.1.3versus their QTY variants, respectively.In the color scale, deep blue indicates the most soluble residues and deep red indicates the most aggregation-prone residues.As indicated by dash lines, the upper portion is the heavy chain (HC), lower the light chain (LC).The yellow dash circles indicate the conserved aggregation-prone patches, the sequences of which are shown above the structures with black backgrounds.F I G U R E 6 Legend on next page.
QTY code, as a simple and robust tool of water-solubilizing design, has been demonstrated to be capable of F I G U R E 7 MD simulations of wild-type/QTY b12-gp120 complex.(A) RMSD curve of wild-type (in green)/QTY (in blue) b12-gp120 complex.(B) Binding free energy of wild-type/QTY b12-gp120 complex.Data are presented as the mean ± standard deviation.(C,D) The major residues (binding energy < À2 kcal/mol) contributing to binding in the interface of wild-type (C)/QTY (D) b12-gp120 complex.The binding energy contribution of each residue is denoted by the color map below the figure.The antibody part (regarded as the receptor, R) is in red and gp120 (regarded as the ligand, L) is in yellow.For example, R_Y98 indicates Tyr at position 98 in the receptor, and L_R419 indicates Arg at position 419 in the ligand.The residue numbering is based on the sequence of PDB: 2NY7.The structure snapshots are from gmx_mmpbsa_ana tool.F I G U R E 6 Visualization and comparison of the aggregation-prone regions of the Fc regions of native antibodies and their QTY variants.The structure snapshots are from the output structure models by A3D 2.0 dynamic modes http://biocomp.chem.uw.edu.pl/A3D2/.(A-D) Wild-type (WT, in green) IgG1 b12, Pembrolizumab, Anti-NPRA IgG4, Mab 61.1.3versus their QTY variants, respectively.In the color scale, dark blue indicates the most soluble residues and dark red indicates the most aggregation-prone residues.The yellow dash circles indicate the conserved aggregation-prone patches, the sequences of which are shown above the structures with black backgrounds.
The interactions of polar residues in IgG1 b12 QTY mimicked the packing of the original hydrophobic residues in IgG1 b12.V121, V144, and V210 in heavy chain of native IgG1 b12 versus T121, T144, and T210 in heavy chain of IgG1 b12 QTY , shown as spheres, (A) versus (B), and sticks, (C) versus (D).L136, V146, and V196 in light chain of native IgG1 b12 versus Q136, T146 and T196 in light chain of IgG1 b12 QTY , shown as spheres, (E) versus (F), and sticks, (G) versus (H).Highlighted residues are shown as spheres.The hydrogen bonds are shown as yellow dashed line.The residue numbering is based on the sequence of PDB: 1HZH.H, heavy chain; L, light chain.The snapshots were from the simulation medoids.
Shuguang Zhang: Conceptualization; writingreview and editing; investigation; supervision; project administration; writingoriginal draft.F I G U R E 8