Systematic optimization model and algorithm for binding sequence selection in computational enzyme design


  • Abbreviations: CA, cephalosporin acylase; DEE, dead-end elimination; GL-7-ACA, glutaryl-7-aminocephalosporanic acid; GMEC, global minimum energy conformation; LP, linear programming; MILP, mixed-integer linear programming; PDB, Protein Data Bank; PG, penicillin G; PGA, penicillin G acylase; PRODA, protein design algorithmic package; RMSD, root-mean-square deviation; TS, transition state.

Correspondence to: Yushan Zhu, Department of Chemical Engineering, Tsinghua University, Beijing 100084, People's Republic of China. E-mail:


A systematic optimization model for binding sequence selection in computational enzyme design was developed based on the transition state theory of enzyme catalysis and graph-theoretical modeling. The saddle point on the free energy surface of the reaction system was represented by catalytic geometrical constraints, and the binding energy between the active site and transition state was minimized to reduce the activation energy barrier. The resulting hyperscale combinatorial optimization problem was tackled using a novel heuristic global optimization algorithm, which was inspired and tested by the protein core sequence selection problem. The sequence recapitulation tests on native active sites for two enzyme catalyzed hydrolytic reactions were applied to evaluate the predictive power of the design methodology. The results of the calculation show that most of the native binding sites can be successfully identified if the catalytic geometrical constraints and the structural motifs of the substrate are taken into account. Reliably predicting active site sequences may have significant implications for the creation of novel enzymes that are capable of catalyzing targeted chemical reactions.