Enhancing the promiscuity of a member of the Caspase protease family by rational design

Abstract The N‐terminal cleavage of fusion tags to restore the native N‐terminus of recombinant proteins is a challenging task and up to today, protocols need to be optimized for different proteins individually. Within this work, we present a novel protease that was designed in‐silico to yield enhanced promiscuity toward different N‐terminal amino acids. Two mutations in the active‐site amino acids of human Caspase‐2 were determined to increase the recognition of branched amino‐acids, which show only poor binding capabilities in the unmutated protease. These mutations were determined by sequential and structural comparisons of Caspase‐2 and Caspase‐3 and their effect was additionally predicted using free‐energy calculations. The two mutants proposed in the in‐silico studies were expressed and in‐vitro experiments confirmed the simulation results. Both mutants showed not only enhanced activities toward branched amino acids, but also smaller, unbranched amino acids. We believe that the created mutants constitute an important step toward generalized procedures to restore original N‐termini of recombinant fusion proteins.

To enhance the purification process and lower the costs, affinity tags can be used. [6][7][8] These get linked to the recombinant protein, resulting in a so-called fusion protein, which shows modified chromatographic properties. However, fusion-tags can influence the structure and characteristics of the protein and may even cause immune reactions when used for in-vivo treatment. 9 Therefore, after succesful purification, the native N-or C-terminus has to be restored when the recombinant protein is intended to be used for therapeutic applications. Thus, fusion proteins are usually equipped with a specific cleavage site that can be targeted for chemical or enzymatic hydrolysis. [10][11][12][13][14] Up to this day there is no universal procedure to cleave fusion tags from a wide variety of proteins. One reason is that N-and C-terminal protein sites vary in their physical and chemical properties. Hence, several procedures exist for tag removal. A very common tool are endoproteases. Serine proteases such as factor Xa, 15 α-thrombin, 16 or enterokinase 17 have high turnover rates but rather low specificity attributed to them. 18 Viral proteases like tobacco etech virus protease (TEV) 19 or human rhinovirus 3C protease, 20 however, are more specific at lower turnover numbers. 18,21 In recent years, self-cleaving enzymes like Inteins 22,23 have attracted interest. 24 But there are still too many disadvantages to using inteins for them to be implemented on a large scale: premature cleavage, 21 slow cleavage kinetics, 25,26 need for a high effective concentration, 23 too high expression load due to large size. 27 From this list of possible systems, we are able to deduce that cleavage protocols are far from universal, but have to be optimized specifically for individual proteins or tags. Therefore, a standardized enzymatic or chemical procedure for processing a large amount of recombinant fusion proteins is highly desirable. It needs to be highly unspecific toward the terminus of the protein that is bound to the tag, but at the same time specific to the cleavage site of the tag, to minimize off-target effects, possibly rendering the recombinant protein unusable for pharmaceutical applications. This could be a protease that is highly specific in recognizing potential binding sites in one terminal direction but is rather promiscuous in the other terminal direction from the site of cleavage.
This problem can be discussed best using the Schechter and Berger protease nomenclature ( Figure 1). 28 The advantage of this scheme is that each binding site can be assigned a selectivity measure individually. Such selectivity measures can be obtained by statistical analysis of substrate data that was derived by experimental screening methods, as available in the MEROPS database. 29 In the current work, human Caspase-2 was designed toward the need for a biochemical scissor to generate natural N-termini of recombinant fusion proteins. Caspases (cysteine-dependent aspartatedirected proteases) are a family of protease enzymes and are the most important inducers of cell apoptosis in animals. 30 They are named according to their specific mode of hydrolysis. A cysteine first gets activated by a histidine, and then nucleophilically attacks the carbonyl C-atom of the protein backbone after an aspartate residue in the target protein sequence (Figure 2). Twelve different members of the Caspase family have been described in humans. 31,32 Caspases are abundant in all animals and are even related to plant and fungal analogous proteases called metacaspases. 33 Caspase-2 was selected as a starting basis for further design since it specifically recognizes and binds to a pentapeptide in the N-terminal direction from the site of cleavage (P1-P5) rather than a tetrapeptide (P1-P4), as bound by other caspases. 34 Catalytic studies have found that Caspase-2 has a more than 1000-fold lower k cat /K M value toward tetrapeptides compared to Casp-1 and Caspase-3. 35 However, pentapeptides have been shown to be cleaved by Caspase-2 with a tenfold higher efficiency than tetrapeptides, indicating that this caspase has a more extended binding pocket. 36 In summary, Caspase-2 is a less efficient catalyst compared to other caspases (like Caspase-3) for the benefit of higher specificity. Unfortunately, in Caspase-2 the binding pockets in the C-terminal direction from the site of cleavage are not entirely unspecific. Specifically, the S1' site has a major contribution in the substrate recognition process as it selectively and preferentially recognizes glycine, serine and alanine over the other amino acids. 32 The P1' site corresponds to the N-terminal amino acid of the recombinant protein, so promiscuity at this site is of major importance. Thus, the motivation of the underlying work was to modify the active site such that the S1' site becomes more promiscuous toward different amino acids, while keeping the selectivity of the S1 to S5 subsites as high as possible.
Changing the specificity of proteases is a problem that was researched quite extensively in recent decades. Successes were achieved in various disciplines with directed evolution approaches being highly successful methods. This is due to high-throughput assay systems having become available. 37 Two big advantages of directed evolution are that it can be done succesfully without any information about the protein structure. Additionally, effective mutations at positions in distance from the actual binding site can be found. Great F I G U R E 1 The Schechter and Berger protease nomenclature for protease-substrate binding. 28 The protease subpockets are denoted with an S while the substrate binding sites are denoted with a P. The protease subsites and the substrate binding sites in C-terminal direction from the site of cleavage are marked with a single prime symbol. The site of hydrolysis is the linkage between P1 and P1', as indicated by the scissor successes have however, also been achieved by rationally redesigning the individual binding pockets by alternating polarity and packing. [38][39][40] In the current work, a rational design approach is described. First, the promiscuities across all members of the human caspase family were analyzed by comparing statistical cleavage data. Further structural analysis together with free energy-calculations were performed to screen the mutants computationally. For alchemical perturbations involving net-charge changes of the system (as one encounters when perturbing a noncharged amino acid into an amino acid with a charged side chain), the charging free energies were corrected by calculating the methodology-dependent error using continuum electrostatics models. [41][42][43] As a result, two protein mutants were proposed that were calculated to be better binders of substrates hosting isoleucine at the P1' site and thus were believed to show better binding toward apolar or polar, branched amino acids. To validate the computational predictions and to test the effect of the proposed mutations, the influences of P1' on cleavage relative to P1' Gly were measured experimentally, both in the mutants and the unmutated protein.
The kinetics and binding were studied in more detail for selected substrates by determining Michaelis-Menten kinetic parameters.
Förster Resonance Energy Transfer (FRET) is an effect that can be used to determine the kinetics of a protease catalyzed reaction. 34 A peptide carrying a fluorophore such as 2-aminobenzoyl (Abz) and a quencher such as 2,4-dinitrophenyl (Dnp) will exhibit only a low level of fluorescence. Upon cleavage of the peptide with an appropriate protease, the fluorescence signal will increase proportionally to the change in concentration. This allows for online monitoring of the cleavage reaction, the initial slope of which can then be fitted to the Michaelis-Menten equation.

| Statistical analysis
Data for a statistical analysis of known substrates of Caspase-1, -2, -3, -6, -7, and -8 was downloaded from the Merops Database. 29 The data was normalized by the natural occurrence of the amino acids in humans (as found in the UniProt Knowledgebase 44 ). For comparisons between different subsites and various proteins, statistical cleavage data was quantitatively expressed in terms of subsite-specific cleavage entropies 45 : where p a,i denotes the (normalized) probability of occurrence of amino that S i = 0 for a perfectly specific pocket i (only allowing for one amino acid to bind) and S i = 1 for a uniform probability distribution over all amino acids.

| Sequence analysis
To evaluate if Caspase-3 could serve as a possible model protein for engineering the Caspase-2 active site, the sequences between the two proteins were aligned using the Needleman-Wunsch algorithm. 46 This was done for the amino acid sequence of the entire protein and the amino acid sequences of the active site regions only. The relevant amino acid regions were gathered by structural visualization of the two proteins using the VMD program. 47 Roughly, contiguous sequences of 35 amino acids hat were at most 1 nm removed from the P1' residue were selected.

| MD simulations
The Caspase-2 crystal structure in complex with the inhibitor N-acetyl-L-leucyl-L-α-aspartyl-L-α-glutamyl-L-seryl-L-aspartic aldehyde Interactions between proteins, substrates and other constituents of the studied systems were described with the GROMOS 54A8 parameter set. 53 In order to focus on configurations that are relevant in the actual substrate cleavage process, the sampling of the proteinsubstrate complex was initially based on a description of the tetrahedral intermediate state ( Figure 2). 54 Figure S5 and Data S1.
Hydrogen atoms were added to the starting structure according to geometric criteria and energy was minimized using the steepestdescent algorithm. For all simulations, water was treated explicitly and implemented by means of the three-site simple point charge (SPC) model. 58 Simulations were carried out under periodic boundary conditions (PBC) based on rectangular computational boxes with at least 0.8 nm between any protein atom and the nearest box wall. The equations of motion were integrated using the leap-frog scheme. 59 Bond vibrations were constrained using the SHAKE algorithm 60

| Free-energy calculations
All changes in protein-substrate binding free energies were calculated along thermodynamic cycles in which the mutations were modelled alchemically twice: alchemical changes of the protein were modelled both in the protein-substrate complex (bound structure) and in the protein without a substrate (apo structure) ( Figure 3); alchemical F I G U R E 3 Thermodynamic cycle to model protein mutations. The horizontal arrows represent alchemical mutations. The vertical arrows represent the binding free energies. The differences for the physical binding processes, ΔΔG bind = ΔG mutant bind − ΔG wt bind and ΔΔG TI = ΔG mutant TI − ΔG wt TI were calculated from the alchemical free-energy estimates, ΔΔG bind = ΔG bound mut −ΔG apo mut and ΔΔG TI = ΔG TI mut −ΔG apo mut . Analogously, the difference for the catalytic step was calculated as ΔΔG ‡ = ΔG TI mut − ΔG bound mut changes of the substrate were modelled in the protein-substrate complexes of the mutant and the unmutated Caspase-2. In total, three sets of free-energy calculations were performed: the main free-energy calculations were performed with the unbound (apo) protein and the covalently bound protein-substrate complex in the tetrahedral intermediate state. These free energies correspond to the free-energy differences between the intermediate state and the unbound state, with neglect of the actual binding process. To also capture the latter energies, additional calculations were performed with the substrate noncovalently bound, that is, according to Figure 2a. These calculations were only performed on mutations that were found to be favorable in the calculations of the tetrahedral intermediate. A final set of freeenergy calculations was performed on the P1-P5 sites of the substrate rather than on the protein itself. Here, the amino acids of these five sites were perturbed into Ala individually (keeping all four other substrate binding sites unperturbed). These free-energy calculations were performed in the fully mutated protein and in the unmutated protein to assess possible alternations in selectivity of the S1 to S5 subsites.
Changes in binding free energies (ΔG A ! B ) were calculated with the thermodynamic integration (TI) approach 63 along progressive perturbations using a λ-dependent Hamiltonian of the system, according to where λ denotes the scaling parameter of the TI procedure and h...i λ denotes ensemble averaging over configurations sampled at a given value of λ. The property ∂H λ ð Þ ∂λ was written out every 40 fs during the simulation. From simulated λ-points (max. 11 per perturbation), further λ-points were predicted, yielding 101 λ-points in total using the extended thermodynamic integration procedure. 64 Every mutation was performed twice in the dimeric protein structures. Standard deviations of the free-energy differences were calculated from the three independent simulations and the two active sites. After 11 equidistant λ-values were simulated for 10 ns per λ-value initially, the termodynamic integration profiles were refined by prolonging simulations up to 50 ns to bring the error-estimates down.

| Corrections for charging free energies
Charging free energies, as calculated for a perturbation of a noncharged amino acid into a charged amino acid (or vice versa), are typically very sensitive to the employed simulation methodology. 41,[65][66][67] Because electrostatic energies are usually calculated using non-Coulombic interaction functions (eg, lattice summation 68,69 or BW method 62 ) under PBC, the calculated electrostatic potentials deviate from the "real" potentials. If only partial charges are perturbed and the (total) net charge of a set of atoms (eg, a charge group or a molecule) stays the same, these errors, that stem from the differences in the electrostatic potentials, mostly cancel. But perturbing a noncharged group of atoms into a charged group of atoms (or vice versa) is similar to the situation as solvating an ion, particularly bringing it from vacuum into solvent. In this situation, the inaccurate electrostatic potentials directly affect the calculated charging free energies. Hence, these quantities must be corrected ex post in order to achieve methodological independence. In short, the "correct" charging free energies can be calculated using continuum electrostatics methods and analytical models. These corrections must account for (a) the deviation of the solvent polarization around the charged group of atoms due to the use of a microscopic system in combination with cutoff-truncation and a reaction-field correction, relative to the "correct" polarization in a macroscopic, nonperiodic and fully Coulombic environment, (b) the deviation of the solvent-generated electric potential in a microscopic box under PBC, relative to the "correct" potential under full Coulombic, macroscopic and non-PBC, (c) the inaccurate electrostatic interactions between the charged group of atoms and other solute atoms due to the usage of cutoff-truncation in combination with a reaction field correction, and (d) an inaccurate dielectric permittivity of the employed solvent model. 42,70 Note, that the correction terms (a), (b), and (c) were called ΔG pol , ΔG psum , and ΔG dir in Ref. [42], while correction term (d) was not listed explicitly there, since this term is typically relatively small and was included in the ΔG pol correction. 43

| Protein expression
Wildtype Caspase-2 has to be activated by autocleavage. 71 To express a variant of Caspase-2 that is fully active without proteolytic cleavage, an uncleavable, circularly permuted variant of Caspase-2 was generated. [72][73][74] It is noted here that for all experimental tests, circularly permuted Caspase-2 was used, for the sake of simplicity, we keep the term Casp-2 for the circularly permuted version throughout the entire work.

| In-vitro protein-based cleavage assays
To build a substrate for in-vitro protein-based cleavage experiments, Human Ubiquitin-conjugating Enzyme E2 L3 was N-terminally linked to a fusion protein containing an N-terminal His tag, a GSG linker, and a VDVAD recognition site for Caspase-2. 73 This resulted in the substrate sequence 6H-GSG-VDVAD-X-E2 where X can be any of the 20 canonical amino acids in order to be able to test the influence of the P1' site on cleavage relative to P1' Gly in both, the mutants and the unmutated protease.
A crucial goal in the design of the Caspase-2 mutant was to minimize the risk of off-target cleavage, thus to maintain the specificity of the S1 to S5 binding pockets at a level comparable to the unmutated protein. In order to be able to assess changes in binding specificity of the S1 to S5 pockets upon mutation, the influence of the mutations on binding were also assessed toward a variety of substrates with alternative recognition sequences and compared to the unmutated Casp-2.
Three alternative sequences were used which were N-terminally linked to E2: (a) the sequence 6H-GSG-DEVD-G-E2. DEVD-G is the preferred recognition sequence and site of cleavage of Caspase-3. Since Caspase-3 lacks an S5 subpocket, 34 Figure S1. All experiments were executed in triplicates. Cleavage was defined as the time needed to cleave 50% of the substrate with amino acid X on the P1' site, normalized by the time needed to cleave 50% of a substrate with Gly on the P1' site. Casp2 (0.01 mg/mL) cleaved 50% of the substrate VDVAD-G-E2 (1 mg/mL) at 25 C in Caspase-2 assay buffer within 1 min. These conditions were defined as the standard activity to which all other reactions were compared. This normalization was needed to make the results independent from protein purity and activity, which may vary due to expression and storage conditions.

| In-vitro peptide-based cleavage assays
In-vitro peptide-based cleavage assays were used to determine Michaelis-Menten parameters. These assays were executed for a lim- with v being the initial slope, V max is the maximum rate, K M is 3 | RESULTS

| Statistical analysis
Analysis of cleavage data indicates that the S1' subsite of Caspase-2, which favors small, unbranched or polar amino acids like Gly, Ser, and Ala over other amino acids (see Table 1) is the second-most specific subsite after S1, which is exclusively binding to Asp. Evaluation of cleavage entropies of caspase family members Caspase-1, Caspase-3, Caspase-6, and Caspase-7 reveals the differences within this set of related proteases (Figure 4). In Caspase-2, the S1' subsite shows a relatively low cleavage entropy compared to the subsites S2', S3', and S4' (Table S1). The S1' binding sites of the other caspases show enhanced promiscuity compared to Caspase-2, while S2', S3', and S4' sites remain unchanged in terms of cleavage entropies. These findings render all of these proteins interesting templates for the optimization of the Caspase-2 S1' binding site. Caspase-3 shows the highest cleavage entropy at the S1' site compared to its family members, while having an overall lower cleavage entropy pattern at the sites S4 to S2.
The Caspase-3 S1 site, on the other hand, shows the same specificity as the S1 site in Caspase-2 (Table S1), as it selectively detects Asp residues. For this reason, we chose Caspase-3 as a template to further optimize the S1' pocket promiscuity of Caspase-2.

| Sequence and structural analysis
Comparing the aligned Caspase-2 and Caspase-3 sequences gave a total sequence identity of only 26.2%. However, the active sites of these family members are more conserved: comparing the sequences that make up the active site only in both proteins gave a sequence identity of 50.0% ( Figure S2), which further supports our choice of using Caspase-3 as a promising candidate to serve as a template for enhancing the Caspase-2 P1' site promiscuity.
Structural analysis of Caspase-2 shows that unlike other substrate binding sites, there is no explicit cavity to host the P1' side chain. The fact that the S1' binding site only shows tolerance toward a limited set of small amino acids is potentially related to the lack of a real S1' binding site. This makes the problem at hand, that is, optimizing the P1' promiscuity, rather complex, since no amino acids directly linked to the S1' pocket can be identified and considered as potential targets for mutation and optimization. Additionally, any mutation that has an impact on P1'   was already bound to the protein in the used crystal structure. The sequence from P1' to P4', on the other hand, was missing in the crystal structure, and was chosen such that the P1' site bears one of the least preferred residues according to the statistical data (Table 1).
Val (P2') and Ser (P3'-P4') are among the most preferred residues, according to the statistical data (data not shown). The P2' to P4' sites show high cleavage entropies (Figure 4), thus the exact choice of amino acids here seems not to be of major importance.
To quantify the individual contributions of each of the four candidate mutations, as well as their potential dependencies on substrate binding, free-energy calculations were performed according to a thermodynamic cycle. This cycle was constructed such, that the full path of mutating the unmutated protein into the quadruple mutant (four mutations) was calculated twice, where the sequential mutations were calculated in exactly reverse order. By doing so, any mutation in path 1 would follow prior mutations that were not seen yet in path 2, and vice versa ( Figure 6). This procedure has another advantage, which is,  Table 2). The mutation Val279Glu shows the strongest effect. The free-energy changes of 11.7 kJ/mol (path 1) and 9.7 kJ/mol (path 2) indicate a very negative impact on the stability. The free-energy change for the mutation Tyr284Phe is positive, but closer to zero and for path 2 even within the error, so this muta-

| Calculations in the substrate P5 to P1 sites
The main goal of the underlying work was to modify the active site such that the S1' site becomes more promiscuous toward different amino acids. However, to ensure that the protease is suitable for targeted N-terminal cleavage, the selectivity of the S1 to S5 subsites has to be kept as high as possible. Changes in the selectivities of these pockets were assessed by mutating the P5 to P1 sites of the bound substrate in the tetrahedral intermediate state into Ala twice: in the double mutant and in the unmutated protease. Table 4 reveals that no significant differences in free energies between the two proteins could be found. According to these calculations, the promiscuity of the S5 to S1 binding pockets was not affected by the mutations. Note: The results of steps 1 and 4 of both paths are shown. ΔΔG bind denotes the (corrected) binding free energies, while ΔΔG raw denotes the uncorrected binding free energies for the mutation that involves net-charge changes. ΔΔG ‡ was calculated as the difference between ΔΔG TI (see Table 2) and ΔΔG bind . All values are reported in kJ/mol. Error estimates indicate standard deviations over three independent simulations and over the two active sites in the dimeric structure.

| In-vitro protein-based cleavage assays
To evaluate the influence of the two mutations that were predicted to  (Figure 7 and Table 5). The S1 to S5 sub-   Note: No significant differences could be found, revealing that substrate promiscuity of the S5 to S1 was not affected by the mutations. All values are reported in kJ/mol. Error estimates indicate standard deviations over three independent simulations and over the two active sites in the dimeric structure.

| In-vitro peptide-based cleavage assays
F I G U R E 7 Influence of P1' on cleavage relative to P1' Gly as determined from protein-based cleavage assays. Reported is the relative time to cleave 50% of the substrate. Both mutants show a significant decrease in cleavage time toward branched amino acids. The horizontal bar at 100% marks the unmutated Casp-2 as a reference. Cleavage of P1' Pro was too slow in protein-based cleavage experiments to report meaningful values. Underlying values are reported in Table 5 [Color figure can be viewed at wileyonlinelibrary.com] shown in Table 6, reveal an elevated catalytic efficiency of the mutant relative to the unmutated protease, as seen from increased k cat /K M values, regardless of the tested P1' amino acids. Due to fluorescent signal quenching at higher substrate concentrations, the confidence intervals for the K M values are rather high. Additionally, the influence of the P1' amino acids on the catalytic efficiency was much more apparent in the turnover number k cat , where an experimental dynamic range of more than 30 000 was covered (from 8 × 10 −6 s −1 to 0.3 s −1 ). In comparison, the dynamic range of the recorded K M values was roughly 7. Because of the larger influence on enzymatic activity and the increased experimental accuracy, the comparison of the turnover number k cat was found to be a more meaningful predictor of Casp-2 activity (see Figure 9).

| DISCUSSION
In this work, Caspase-2 was modified in-silico to allow for a more promiscuous binding of the S1' subsite. The approach we chose was to search for a suitable template protein, which bears the features we were trying to modify in Caspase-2, but is also close enough to our protease of interest in terms of sequence-and structural similarity. We found a suitable protein in human Caspase-3 (Caspase-3). Comparison of statistical data 29 between all human caspases revealed that the Caspase-3 S1' subsite shows the highest promiscuity within this protein family. Simultaneously the active site structure between Caspase-2 and Caspase-3 is highly conserved. By structural analysis, we identified all residues which differ between Caspase-2 and Caspase-3 and are in close proximity to F I G U R E 8 Experimentally determined specificities from protein-based cleavage assays using a set of alternative recognition sequences. The time needed to cleave 50% of the substrate increases by two orders of magnitude toward DEVD-G and by four orders of magnitude toward the DETD-R and VDQQE-G sequences compared to the native recognition site (VDVAD-G-E2). There are no significant differences between the single mutant, the double mutant and the unmutated Casp-2. Note the logarithmic scale on the y-axis [Color figure can be viewed at wileyonlinelibrary.com] T A B L E 5 Influence of P1' on cleavage relative to P1' Gly as determined from protein-based cleavage assays the S1' binding site. These candidate mutations were believed to have a possible positive impact on S1' (and S2'-S4') subsite promiscuity while hardly affecting the specificity of the S1 to S5 subsites. Protein-based cleavage assays (Figure 7 and Of the four probed mutations, the mutations Val279Glu and Tyr284Phe show an unfavorable change in free energies, where Val279-Glu is prominently unfavorable compared with the other three mutations. The total free energy change of the quadruple mutant sums up to a value that is not significantly different to zero (Table 2). If one assumes that the four probed mutations are the main factor for the decreased specificity of the S1' binding pocket of Caspase-3, one would not predict highly favorable binding of Ile to the Caspase-3 S1' pocket. Indeed, the statistical data proves that Ile is a binder of the Caspase-3 S1' pocket, but only a very poor one (Table 1). By selecting only two of the four mutations, we could increase the reactivity for Ile in Caspase-2.
The normalized cleavage data was also used to recalculate "experimental" cleavage entropies for the mutants from the experimental data. Here, the experimental data was treated in the same way as the statistical data. While it is not meaningful to translate cleavage times to a probability measure (as needed for the calculation of cleavage entropies, see Equation (1)), the experimental cleavage profile of a protease is believed to be the same as the (hypothetical) statistical cleavage profile. In this respect, the calculation of experimental cleavage entropies (after translation of the data to probabilites) is a relevant measure. Experimental cleavage entropies were calculated for the unmutated Caspase-2, the single as well as the double mutant.
For the unmutated Casp-2, the calculated experimental cleavage entropy (0.50 ± 0.04) was remarkably close to the statistical cleavage entropy (0.56, see Table S1). The experimental cleavage entropies for the two proposed mutants (single mutant: 0.56 ± 0.02, double mutant: 0.58 ± 0.02) are far from reaching a promiscuity level that is comparable to the cleavage entropy of the Caspase-3 S1'-pocket (0.84, see Table S1). However, while cleavage entropies are a measure of overall binding promiscuity, both mutations were selected to render the mutants S1' site a better binder of Ile. Thus, is does not come with a surprise that the experimental cleavage entropies reveal only a moderate increase in overall binding promiscuity of the S1' pocket.
In the current work, a suitable template protein could be found to guide modifications for the protein of interest. In cases where this is not possible, the search space for 10 amino acids involved in substrate binding involves 10 19 possible mutations. Testing all of these is impossible with current methods and resources. Furthermore, computationally redesigned proteins often turn out to be less stable under experimental conditions, which might be related to the problem that even minimal changes in the amino-acid sequence can often lead to unexpected changes in loop conformations, unfolding, or aggregation. 77 However, in recent years, many methods and tools were described that enable the reliable prediction of a large variety of potential mutations. Fast methods were developed that do not rely on computionally demanding free-energy calculations but predict the effect of mutations from easier-to-calculate parameters like amino-acid occlusion from solvent, pairwise potentials and inter-molecular energies. 78,79 Another succesfull approach was to combine computational algorithms with methods of experimental screening of protein-libraries to design an ubiquitin-ligase for binding to an unnatural interface. 80 New methods to extend sampling while saving on computational costs like the weighted-ensemble strategy 81,82 were developed and used effectively in the redesign of a protein conformational switch. 83 Software suites like ROSETTA and ORBIT were succesfully used for de novo protein design. [83][84][85][86][87][88][89] ROSETTA makes use of an extended energy function including reference energy terms for discriminating between protein mutants. It was generalized to work in many different contexts and is widely used to efficiently discriminate between mutants. 90 It was shown that free-energy differences for a large number of different substrates can be calculated from single simulations of unphysical reference states in combination with a third-power fitting approach to capture the effects of molecular dipoles or charged states. 91,92 To screen for more mutants, these methods are currently employed by the authors to perform in silico saturation mutagenesis of Caspase-2.
These advances and successes indicate the applicability of modern protein design methods. Altough enzymes designed with computational aid do usually not meet the efficiencies of natural enzymes, they can often be further improved by directed evolution. 93 It can be summarized that protein engineering has become a robust and reliable field, also when de-novo methods have to be applied or proteins have to be reingeneered without templates.

| CONCLUSION
The restoration of the native N-terminus after protein purification is of great importance in pharmaceutical industry. However, it remains a challenging task-due to the manifold characteristics of proteins. Protocols for hydrolysis, that is, cleavage at a specific position in a protein sequence to retrieve a desired N-terminus, usually have to be optimized for every protein individually, which is a high-cost factor for industrial production of recombinant proteins. While the target sequence for cleavage from the site of hydrolysis toward the N-terminus can be chosen freely and optimized for the protease which is chosen for tag removal, this is not true for the target sequence in the C-terminal direction from the site of hydrolysis, since this sequence constitutes the N-terminus of the fusion protein. A protease used for universal tagremoval thus requires highly specific binding pockets in N-terminal direction from the site of hydrolysis and rather promiscuous binding behavior in C-terminal direction from the site of hydrolysis.
In this work, human Caspase-2 was engineered to yield a more promiscuous S1' subsite. A template protein, human Caspase-3, which possesses a less specific S1' subsite, was used to predict possible mutations of the Caspase-2 active site. Free-energies of binding were calculated with a substrate model that features an Ile residue at the P1' binding site. The change of binding affinity of this substrate was assessed for four different candidate amino acid mutations, which were selected based on the sequence comparison between Caspase-2 and Caspase-3 active sites, followed by a structural analysis of the Caspase-2 active site. The latter step in this workflow was chosen to filter active site amino acids by their subpockets in order to preserve the desired specificity of these sites.
Two suggested mutants based on the free-energy calculations were tested in-vitro. Changes in P1' substrate promiscuity were assessed by measuring the influence of P1' on cleavage relative to P1' Gly and by measuring Michaelis Menten parameters for a limited set of amino acids.
Both mutants showed a significant change in activity toward substrates with branched and apolar amino acids. The mutated protein shows no significant changes of S1 to S5 binding pocket specificities. This was assessed by the means of computation and experiments, using a set of substrates with different recognition sites. Also, the stability of Casp-2 was not affected by either of the mutations, as was tested experimentally using elevated incubation temperatures and chaotropic agents as supplements in the reaction media. Thus, the created mutants are believed to be a first major tool for a toolbox that constitutes an important step toward an universal procedure for N-terminal fusion-tag cleavage in industrial processes.