Structural dynamics and mechanistic action guided engineering of lipolytic enzymes

Lipases have been established as important biocatalysts in several industrial applications, owing to their diverse substrate specificity. The availability of data on three‐dimensional crystal structures for various lipases offers an opportunity for modulating their structural and functional aspects to design and engineer better versions of lipases. With the aim of investigating the structural components governing the extremophilic behavior of lipases, structural analysis of microbial lipases was performed using advanced bioinformatics and molecular dynamics simulation approaches. In sequences and functionally distinct alkaliphilic and thermophilic lipases were investigated for their functional properties to understand the distinguishing features of their structures. The alkaliphilic lipase from Bacillus subtilis (LipA) showed conformational changes in the loop region Ala132–Met137, subsequently, the active site residue His156 shows two conformations, toward the active site nucleophilic residues Ser77 and away from the Ser77. Interestingly, the active site of LipA is more solvent‐exposed and can be correlated with the adoption of an open conformation which might extend and expose the active site region to solvents during the catalysis process. Furthermore, the MD simulation of thermophilic lipase from marine Streptomyces (MAS1) revealed the role of N‐ and C‐terminal regions with disulfide bridges and identified a metal ion binding site that facilitates the enzyme stability. The novel thermo‐alkaliphilic lipase can be designed to integrate the stability features of MAS1 into the alkaliphilic LipA. These structural‐level intrinsic characteristics can be used for lipase engineering to amend the lipase activity and stability as per the requirements of the industrial processes.


| INTRODUCTION
Lipases (triacylglycerol hydrolases, EC 3.1.1.3) is a subclass of serine-hydrolases that catalyzes the cleavage of ester bonds in fats and oils by the addition of water, and subsequent release of diacylglycerols, monoacylglycerols, free fatty acids, and glycerol. The natural substrates of true lipase are long-chain acylglycerols (carbon number ≥10), whereas the other lipolytic enzymes, grouped under the class esterases, act on short-chain fatty acylglycerols (carbon number <10). [1][2][3] The microbial lipases are 20-60 kDa proteins and possess a considerable diversity in their substrate selectivity, which expands their applicability domain. They are the most versatile among industrial enzymes and are used in a number of industrial applications, including the pharmaceutical, dairy, detergent, cosmetic, oleochemical, fat-processing, leather, textile, cosmetic, and paper industries. [4][5][6] Due to the constant efforts of several research groups, huge data on sequence and structural features are available for several lipases. The crystal structures of several lipases have been determined during the last few years and are available with Protein Data Bank (PDB, http://www.rcsb.org). Lipases usually show α/β hydrolase fold, a characteristic feature of this class of enzymes, with six parallel β-sheets flanked by α-helixes on both sides in their protein structure. 7 The active site of these lipases generally consists of Serine, Aspartic/Glutamic acid, and Histidine, forming a catalytic triad, and is highly conserved. The conserved pentapeptide (Gly-x-Ser-x-Gly) is another characteristic feature of the lipases. The mechanism of lipid hydrolysis by lipases and esterases is identical. The nucleophilic active site serine residue is always located in a short loop between a β-strand and α-helix, as a part of Gly-x-Ser-x-Gly consensus. The hydrophobic nature of the active site (the active site of esterases can accept short-chain fatty acid esters, which are partially hydrophobic) makes it suitable for accepting triglycerides and other hydrophobic substrates. Many lipases contain one or more short α-helix, forming a lid/flap. One important aspect of reaction catalysis by lipolytic enzymes is interfacial activation. 1,8 When the lipase comes close to a hydrophobic surface, the lid uncovers the active site revealing it to the substrate, and the lipase becomes active. Lipases are usually active at a water/hydrophobic interface and in organic solvents. These properties along with the nonrequirement of cofactors by lipases, make them valuable biocatalysts for several biotechnological applications. However, all naturally occurring lipase enzymes do not meet parameters which are crucial for their industrial applications, for example, improved activity, enhanced thermostability, and so forth. Adaptation of the pre-existing gene can play an important role in evolving the enzyme with better activity and stability. Though there might be multiple ways to modulate the catalytic function of an enzyme by imparting properties such as thermostability, alkaliphilic activity, and enantioselectivity, the approach that minimizes the effort is preferred. 9,10 In recent years, many studies have been carried out on microbial lipases using integrated advanced bioinformatics, molecular dynamics (MD) simulation, and experimental approaches. [11][12][13][14][15] Molecular modeling studies offer a suitable alternative to the time-consuming and laborious experimental identification of the factors governing the desirable properties of any lipase.
To identify lipases of alkaliphilic and thermophilic nature, a comparative sequence (biological [Protein/ DNA sequence] evolutionary record), structural and functional features of microbial lipases were studied in silico, and the information so obtained was evaluated, analyzed and systematized, using the bioinformatics tools to classify the microbial lipases sourced from the extremophilic environments. The large-scale MD simulation calculations were undertaken to identify the important structural features responsible for maintaining the enzyme stability and activities under extreme pH and temperature conditions. Furthermore, molecular-level structural topology and their mechanistic conformational behaviors were investigated using the MD approach. The finding of this study would enrich the current knowledge of lipases with respect to their structure and function relationship and industrial applications.

| Sequence retrieval for homologous lipases
Initially, two alkaliphilic Bacillus subtilis lipase/esterase sequences (UniProt IDs: I6V559 and Q79F14) were selected as input queries to retrieve the sequences of homologous lipases. The search was performed for each query sequence IDs: I6V559 and Q79F14 against the UniProtKB target database (https://www.uniprot.org/) using the basic local alignment search tool (BLAST) by applying the default parameter, that is, e value threshold was kept to 0.0001 and maximum 1000 hit. The unique sequences were assigned at 50% sequences identity cutoff value using the CDHIT program. 16 After that, the retrieved data was manually crosschecked, and false positive hits (including the sequences with lengths of ≤150 and ≥500 residues) were removed. Finally, 49 sequences were selected for further evaluation, analysis, and interpretation.

| Comparative analysis of retrieved sequences
All 49 representative sequences were clustered at 50% identity and selected for multiple sequence alignment (MSA) and molecular phylogenetic analysis. For MSA, the MUSCLE module was used in the MEGA-X program. 17 The evolutionary history was inferred using the minimum evolution (ME) method. 18 The bootstrap consensus tree inferred from 250 replicates was used to represent the evolutionary history of the taxa analyzed. Branches corresponding to partitions reproduced in less than 50% of bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa were clustered together in the bootstrap test (250 replicates) were shown next to the branches. 19 The evolutionary distances were computed using the Dayhoff matrix-based method 20 and were in the units of the number of amino acid substitutions per site. The ME tree was searched using the close-neighbor-interchange algorithm at a search level of 1. The neighbor-joining algorithm was used to generate the initial tree. 21 The analysis involved 49 amino acid sequences. All ambiguous positions were removed for each sequence pair, and there was a total of 589 positions in the final data set. Evolutionary analysis was conducted in the MEGA X program. 17 Cluster-1 sequences (uniport IDs: I6V559 and Q79F14) and their homologous sequences, which showed 50% identity, were aligned using the ClustalW program and a phylogenetic tree was constructed using unweighted pair group method with arithmetic mean method.

| Prediction of evolutionarily conserved sequences
Evolutionarily conserved sequences were predicted using the ConSurf program. 22,23 The primary protein sequence of alkaliphilic lipase from Bacillus subtilis (cluster-1, Uniport ID: I6V559) was used as a query to find homologous sequences in the protein database UNIREF-90 (http://www.uniprot.org/help/uniref) using HMMER (hidden Markov models) homolog search algorithm, with sequence identity and an e value cutoff of 50% and 0.00001, respectively. The searched homologous hits were considered for the MSA study using the MAFFT program. During the calculation of the conservation score, all parameters were kept at default values. The conservation score was calculated based on structural architecture. Similar sequence diversity and conservation score analysis were performed for cluster-2 (Uniport ID: H0B8D4), with sequence identity cut-off kept to 50%.

| MD simulation and comparative analysis
The phylogenetic analysis of cluster-1 and cluster-2 indicated an evolutionary neighborhood for the two groups. Therefore, the representative three-dimensional (3D) crystal structure of alkaliphilic LipA from cluster-1 (Bacillus subtilis, PDB ID: 1I6W) and thermostable MAS1 from cluster-2 (marine Streptomyces, PDB ID: 5H6G) were retrieved from PDB database. To identify the structural components responsible for functional differences between the two lipases, the detailed analysis of molecular topology and conformational dynamics was carried out by employing MD simulations using the AMBER 14.0 software package with standard ff4SB force fields. 24,25 As reported in previous studies, the MD simulation standard protocol was used. 26,27 Using the tleap module, all required parameters were used considering ionizable residues set at their default protonation states at a neutral pH value. Each setup was neutralized by adding Na + /Cl − ions and solvated in a truncated octahedron box of the TIP3P water model 28 with a margin distance of 10 Å. The long-range Coulombic interactions were treated with the particle mesh Ewald (PME) method, using a cut-off of 10.0 Å. 29 The SHAKE algorithm 30 was employed to restrain all atoms covalently bonded to hydrogen atoms, allowing an integration time of 2 fs. Periodic boundary conditions were used to avoid edge effects. The box was minimized by 750 steps of the steepest descent method following 500 steps of the conjugate gradient method while restraining the protein using a force constant of 2 kcal/mol Å 2 . The system was gradually heated from 0 to 300 K over a period of 50 ps and maintained at 300 K with force constant of 2 kcal/mol Å 2 . Each system was equilibrated for 1.0 ns, and final production was run up to 200 ns. MD simulation production was extended up to 500 ns for MAS1 lipase. MD simulations calculations were performed using the GPU NVIDIA Tesla K80 (Kepler) installed on the Kebnekaise supercomputer at HPC2N (https://www.hpc2n.umu.se/). The generated trajectories were analyzed and visualized using UCSF Chimera and VMD (visual molecular dynamics) software. 31,32 The system stability analysis was performed using the ptraj module available in AmberTools.

| Molecular diversity of microbial extremophilic lipases
The BLASTP performed using two query sequences against the UniProtKB database resulted in identifying 910 unique alkaliphilic and thermophilic homologous microbial lipase sequences. These lipase sequences show sequence identity ranges from~5 to~99%. The clustering of 910 unique sequences at a sequence identity cut-off of 50% resulted in filtering 49 representative sequences. Clustering results indicated the diversified origin of identified lipase sequences. The representative 49 lipase sequences showed significant sequence similarities, and their phylogeny tree analysis revealed that they could be clustered into mainly three groups, as shown in Figure 1.
The first group, that is, cluster 1, comprised 21 sequences which possessed significant similarity with the previously studied alkaliphilic lipases. 33 The second group, that is, cluster 2, was formed by 16 sequences and shared sequence similarity to thermophilic lipases. The third group, that is, cluster 3, consisted of 12 sequences for which a functional pattern could not be identified. Cluster 3 members were also structurally and functionally diverse compared to the other two clusters; hence, this cluster was excluded from further investigation. Interestingly, most of the cluster 1 members have a minimal α/β hydrolase fold. In contrast, cluster 2 members showed two loop regions which play an important structural role in the formation of substrate binding lid domain. These F I G U R E 1 Molecular evolutionary relationship tree was hypothesized using the minimum evolution method in MEGA X.
representative sequences were analyzed for phylogenetic relatedness. A highly conserved critical active site catalytic triad (Serine, Aspartic/Glutamic acid, and Histidine) was observed during MSA analysis. Interestingly, a conserved pentapeptide motif, that is, Gly-X-Ser-X-Gly, was observed in the entire data set, where Ser is the nucleophilic center in the active site and is part of the catalytic triad. This motif is located in the hydrophobic sub-pocket of the lipase substrate-binding site. 33,34 The alanine residues (dipeptide or tripeptide) in the lid domain of Bacillus thermo-alkalophilic lipases, favored residues at the N-cap and C-cap position reported by the Chakravorty et al. 35 The Ala residue of the AXSXG motif plays a role in thermostability, as noted earlier for Bacillus thermoalkalophilic lipases. 35,36 Furthermore, analysis of evolutionarily conserved sequences revealed that the functional regions, including the substrate binding site and the α/β hydrolase fold, were highly conserved in LipA. In addition, the N terminal region, composed of 30 amino acids, was found to be highly variable amongst various lipases compared with rest of the structure, which was highly conserved during evolution. This variable region is reported to form a signal peptide. The key pentapeptide motif (AHSMG) from the active site was found to be highly conserved and buried in the enzyme structure (Supporting Information: Figure SI.1). The structural analysis of thermostable lipase from Streptomyces (MAS1), a LipA homolog (UniProt ID: H0B8D4), revealed that the key pentapeptide motif GHSQG is highly conserved, whereas the N terminal variable region (40 amino acids) is highly variable. Interestingly, in the active site pentapeptide motifs of LipA and MAS1, the residues at positions i (A/G) and i+3 (M/Q) are non-identical. In addition, the residues from 186 to 195 (here, residues numbering is according to UniProt ID: H0B8D4) were identified to be contributing to the differentiating characteristics of the two enzymes. These residues play a structural role in forming the lid domain for the active site (see Figure 2). Further details about the conservation score are provided in Supporting Information: Figure SI

| MD of LipA and MAS1, a comparative analysis
From the comprehensive evaluation of available lipase sequences and subsequent phylogenetic analysis, the two representative sequences have chosen for which crystal structures are available. Further, structural comparisons were investigated for the two representative structures (i) alkaliphilic lipase (LipA) from Bacillus subtilis (cluster-1, PDB ID: 1I6W) 37 and (ii) thermostable lipase (MAS1) from marine Streptomyces (cluster-2, PDB ID: 5H6G). 38 A published study regarding LipA revealed that its activity decreases strongly above pH 10.5 and below pH 6.5 and was reported to be remarkably stable at alkaline pH, showing maximum stability at pH 12 while the optimum temperature was 35°C and was stable for at least 30 min at 40°C. 39 On the contrary, MAS1 lipase is a thermostable lipase with an optimal reaction temperature of 70°C, and it retains >80% activity in the temperature range of 55-75°C. 38 The root mean square deviation (RMSD) value of the aligned Cα atoms of LipA and MAS1 was calculated to be 1.34 Å. All over structural topology revealed that its key structure, consisting of α/β hydrolase fold, shared a similar structural topology of thermo-alkaliphilic lipases. The structural alignment clearly indicated that the MAS1 lipase has an extra lid domain that is characteristic of this lipase 40 (Figure 2). Addition of this lid domain, MAS1 lipase also has extended terminal regions and disulfide bridges. The structural analysis and biochemical assays of MAS1 revealed that disulfide and other salt bridges play a vital role in the thermostability of this enzyme. 38 Those disulfide bridges are present on both N-and C-terminals of MAS1 lipase.

| Conformational changes in the active site region of LipA
To understand the structural conformation of LipA, MD simulation was performed up to 200 ns with the F I G U R E 2 Superimposed structures of alkaliphilic lipase (LipA) and thermostable lipase (MAS1). The cyan and purple color represent LipA and MAS1, respectively. explicit-solvent model. It observed a highly ordered α/β hydrolase fold and loop region in the active site region. Structural flexibility investigated considering RMSD value (≤2.5 Å) revealed it's a stable structure. The RMSD values of the loop region between Ala132 and Met137 showed significant variations compared with the protein backbone with RMSD difference of more than 2.5 Å. Furthermore, the RMSD values of the active site residue of the His153 side chain showed two different conformations during its catalytic action towards the substrate, as shown in Figure 3. During the MD simulation studies, it was observed that the active site histidine residue comes closer to the active site aspartic acid residue and establishes the hydrogen bonding interactions. This interaction increases the pKa of the histidine imidazole nitrogen, allowing the histidine to act as a powerful general base and activate the serine nucleophile in the catalytic reaction. 41 The loop region Ala132-Met137 near the active site harboring one of the active site residue Asp133 was observed to be conformationally flexible. Indeed, the conformation adopted by this region is significantly different from the two conformations, which results in an alternate geometry. The existence of this loop region near the vicinity of the active site might help in the binding of incoming substrate and may affect the activity of the alkaline lipase. The catalytic residue Ser77 was wellordered during the entire MD simulation, while the His156 acquired two different conformations (open and closed), as depicted in Figure 4. In open conformation, His156 goes far from Ser77, while in close conformation state, it comes close to Ser77. This phenomenon was observed during the movement of the loop region comprised of Ala132-Met137. It is reporting this loop flexibility in alkaline lipase for the first time. On the contrary, the thermostable lipase (MAS1) from marine Streptomyces and other microorganisms has a large lid domain that is present near the active site. The two different conformations of this active site lid domain were already explored. 40,42 In the alkaline LipA, such lid domain is not present, as depicted in Figure 2 (see superimposed structures). The flexibility of the loop region (Ala132-Met137) near the active site may increase and decrease the distances toward the nucleophilic serine residue of the active site residue establishing the opening and closing movements of this loop region. MD simulations at different time points revealed that complete opening and closing movement was not observed during the 100 ns simulation. However, a noticeable opening movement was observed when the simulation was extended to 200 ns ( Figure 4A,B). Conformational changes associated with the opening and closing of the loop region showed a distance of~10 Å. However, compared with its homologs from cluster 2 (a thermostable lipase [MAS1]), there is no lid domain in LipA, interestingly the active site of LipA is more solventexposed and can be correlated with the adoption of an open conformation which might extend and expose the active site region to solvents during the catalysis process.

| MD simulation of wild-type MAS1 lipase
The molecular dynamic simulations were also performed for thermostable lipase (MAS1) to investigate F I G U R E 3 The RMSD of protein backbone atoms, loop region Ala132-Met137 and active site residues during 200 ns molecular dynamics simulation. the mechanistic conformational behaviors and structural stability. Initially, performed a 200 ns run, but no significant changes were observed. I have extended the production of MD simulation up to 500 ns; during this time scale, conformational changes were observed mainly in the active site lid domain. The RMSD analysis indicated the two different conformations of the active site's lid domain, referred to as open and closed conformations ( Figure 5). This lid domain movement had been previously explored in many thermostable lipases. 43,44 The overall protein structure was stable during the entire MD simulation except for some flexibility observed in the loop region and the active side lid domain. The protein backbone RMSD values were observed below 2 Å during the MD simulation run. The N-and C-terminal regions showed less flexibility because both are bound by a disulfide bond. Remarkably, the active site resides His232 exhibited two different conformations like the histidine residue of LipA, shown in Figure 6. Interestingly, a metal ion center was found near the active site of MAS1. The metal ion (sodium ion) was coordinated by the residues Asp228, Asp239, and Ser230 and showed a stable bound form with those three amino acid residues during the entire MD simulation. This metal ion might be playing an important role in providing stability to this lipase in the presence of the disulfide bond between Cys225 and Cys260 at the C-terminus. The metal ion coordinating residues Asp228, Asp239, Ser23, and disulfide bonding residues are highly conserved during the evolution.

| MD simulation of mutated cysteine residues in MAS1 lipase
Mutations away from the active site have been shown to considerably alter the catalytic activity of enzymes F I G U R E 6 The RMSD of wild-type protein backbone atoms, lid domain, N-and C-terminal regions, and active site residues during 500 ns molecular dynamics simulation.
F I G U R E 7 The RMSD of mutated (Cys22/Ala22 and Cys260/Ala260) protein backbone atoms, lid domain, N-and C-terminal regions, and active site residues during 500 ns molecular dynamics simulations. experimentally reported. 45,46 In this study, I have replaced the cysteine residues at positions 59 and 260 with alanine and studied the resulting change in enzyme structure using MD simulations. As discussed in the wild-type MD simulation studies, both cysteines were involved in the disulfide bond formation and provided stability to the protein structure. The mutated system showed significant deviation from the beginning of the MD simulation run. RMSD values are observed higher not only for disulfide-linked regions but also for protein backbone. The extended production of MD simulation (up to 500 ns) showed more structural instability. The RMSD analysis and visual investigation clearly showed that both disulfide linking N-and C-terminal regions are more structurally disturbed (Figure 7). However, the C-terminus cysteine mutation caused drastic changes in the terminal area and distorted the metal ions center. It was observed that there is no disulfide bond formation between Cys225 and Cys260 in the mutated system, consequently no metal ion binding center. Disulfide bond formation between Cys225 and Cys260 led to the metal ion center and helped maintain the geometry of metal ions coordinating residues (Figure 8). The accumulating strength of the disulfide bond and metal ion bound provides extra stability for this protein. The Nterminus disulfide bond residues might provide relatively less stability of MAS1; however, this mutation does not significantly perturb the 3D structure of the protein. Still, significant structural changes were observed after the 300 ns MD simulation run.
Zhao et al. investigated that the disulfide bonds and some salt bridges contribute significantly to the thermostability of MAS1 lipase. 38 Furthermore, sitedirected mutagenesis studies were performed in the heterologous expression system, and activity assay results reveal that the disulfide linkages can assist in stabilizing the overall conformation of the enzyme. 38 These MD studies explored the dynamics of extremophilic lipases and identified structural level characteristic features that can be modulated to design and engineer existing lipases. The alkalophilic LipA enzyme can be engineered by incorporating identified thermophilic structural features of MAS1 lipase in different ways to make a thermoalkaliphilic lipolytic enzyme. The minimal α/β hydrolase fold of both enzymes is quite similar to each other. However, except for the lid domain, N-and C-termini of thermostable MAS1 lipase are also in extended form. As discussed earlier, the Nterminus region has one disulfide bond, while the Cterminus region has one disulfide bond and one metal ion center (Figure 8).
The new thermoalkaliphilic lipolytic enzyme can be designed in the following ways: (i) inserting the Cterminal extended region of MAS1 into LipA at the same place and substituting the residue Gly153/Cys153 for proper disulfide bond formation, (ii) inserting Nterminal extended region of MAS1 into LipA at the same place and substituting the residue Ser32/Cys32 for proper disulfide bond formation, and (iii) creating the metal ion center at the C terminus mimics to MAS1 by substituting the residue. In recent years, many studies have been reported on lipolytic enzymes using the MD approach to alter the enzyme properties. 13,47 Singh et al. explored the thermostability properties of Bacillus subtilis lipase using the MD simulation approach. 48 Recently, Dong et al. improved the thermostability of LipA. 49 More recently, another study about understanding the LipA resistance in ionic liquids solvent using the computational MD simulation has been reported. 50 This study provides insights into the conformational dynamics behaviors and explores the hidden structural features of thermophilic and alkaliphilic lipolytic enzymes.

| CONCLUSIONS AND FUTURE DIRECTIONS
Primary annotated protein sequences and 3D structural information of extremophilic (thermophilic and alkaliphilic) lipases were considered for an in-depth understanding of the structural-functional relationship and to find the critical hotspot regions for designing more potential thermoalkaliphilic lipolytic enzymes. This investigation suggests that the minimum structural fold of lipolytic enzymes is quite similar, even though they are diverse at the sequence level. To understand the conformational dynamics and molecular determinants of extremophilic activity, the alkaliphilic LipA lipase from Bacillus subtilis and thermostable MAS1 lipase from marine Streptomyces species were further investigated using MD simulation studies. The MD simulation studies revealed that the LipA loop region (Ala132-Met137) shows movement; consequently, active site residue His156 shows two different conformations. However, the loop lies in the active site's proximity; this conformational change may also affect substrate binding and the activity of the alkaline lipases. Furthermore, the MD simulation of MAS1 lipase explored the role of Nand C-terminal regions with disulfide linkages and identified a metal ion binding site that aids enzyme stability. The new chimeric thermoalkaliphilic lipolytic enzyme can be designed by integrating those stability features. Those structural insights and characteristics features are fundamental needs for designing novel lipolytic enzymes.

ACKNOWLEDGMENTS
The author acknowledges the resources of the High-Performance Computing Center North (HPC2N), the Swedish Infrastructure Committee (SNIC).

CONFLICT OF INTEREST STATEMENT
The author declares no conflict of interest.

DATA AVAILABILITY STATEMENT
The generated data sets are available on reasonable request. The data that support the findings of this study are available from the corresponding author upon reasonable request. ORCID Rajender Kumar http://orcid.org/0000-0002-3322-8621