Get access

On the development of protein pKa calculation algorithms

Authors

  • Tommy Carstensen,

    1. School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
    Search for more papers by this author
  • Damien Farrell,

    1. School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
    Search for more papers by this author
  • Yong Huang,

    1. Department of Biochemistry and Molecular Biophysics, Washington University, St. Louis, Missouri 63110
    Search for more papers by this author
  • Nathan A. Baker,

    1. Knowledge Discovery and Informatics Group, Pacific Northwest National Laboratory, Richland, Washington
    Search for more papers by this author
  • Jens Erik Nielsen

    Corresponding author
    1. School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
    • School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
    Search for more papers by this author

  • The authors state no conflict of interest.

Abstract

Protein pKa calculation methods are developed partly to provide fast non-experimental estimates of the ionization constants of protein side chains. However, the most significant reason for developing such methods is that a good pKa calculation method is presumed to provide an accurate physical model of protein electrostatics, which can be applied in methods for drug design, protein design, and other structure-based energy calculation methods. We explore the validity of this presumption by simulating the development of a pKa calculation method using artificial experimental data derived from a human-defined physical reality. We examine the ability of an RMSD-guided development protocol to retrieve the correct (artificial) physical reality and find that a rugged optimization landscape and a huge parameter space prevent the identification of the correct physical reality. We examine the importance of the training set in developing pKa calculation methods and investigate the effect of experimental noise on our ability to identify the correct physical reality, and find that both effects have a significant and detrimental impact on the physical reality of the optimal model identified. Our findings are of relevance to all structure-based methods for protein energy calculations and simulation, and have large implications for all types of current pKa calculation methods. Our analysis furthermore suggests that careful and extensive validation on many types of experimental data can go some way in making current models more realistic. Proteins 2011; © 2011 Wiley-Liss, Inc.

Ancillary