Fast and accurate generation of ab initio quality atomic charges using nonparametric statistical regression




We introduce a class of partial atomic charge assignment method that provides ab initio quality description of the electrostatics of bioorganic molecules. The method uses a set of models that neither have a fixed functional form nor require a fixed set of parameters, and therefore are capable of capturing the complexities of the charge distribution in great detail. Random Forest regression is used to build separate charge models for elements H, C, N, O, F, S, and Cl, using training data consisting of partial charges along with a description of their surrounding chemical environments; training set charges are generated by fitting to the b3lyp/6-31G* electrostatic potential (ESP) and are subsequently refined to improve consistency and transferability of the charge assignments. Using a set of 210 neutral, small organic molecules, the absolute hydration free energy calculated using these charges in conjunction with Generalized Born solvation model shows a low mean unsigned error, close to 1 kcal/mol, from the experimental data. Using another large and independent test set of chemically diverse organic molecules, the method is shown to accurately reproduce charge-dependent observables—ESP and dipole moment—from ab initio calculations. The method presented here automatically provides an estimate of potential errors in the charge assignment, enabling systematic improvement of these models using additional data. This work has implications not only for the future development of charge models but also in developing methods to describe many other chemical properties that require accurate representation of the electronic structure of the system. © 2013 Wiley Periodicals, Inc.