Software news and updates electronegativity equalization method: Parameterization and validation for organic molecules using the Merz-Kollman-Singh charge distribution scheme

Authors

  • Zuzana Jiroušková,

    1. National Centre for Biomolecular Research, Faculty of, Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
    Search for more papers by this author
  • Radka Svobodová Vařeková,

    1. National Centre for Biomolecular Research, Faculty of, Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
    Search for more papers by this author
  • Jakub Vaněk,

    1. National Centre for Biomolecular Research, Faculty of, Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
    Search for more papers by this author
  • Jaroslav Koča

    Corresponding author
    1. National Centre for Biomolecular Research, Faculty of, Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
    • National Centre for Biomolecular Research, Faculty of, Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
    Search for more papers by this author

Abstract

The electronegativity equalization method (EEM) was developed by Mortier et al. as a semiempirical method based on the density-functional theory. After parameterization, in which EEM parameters Ai, Bi, and adjusting factor κ are obtained, this approach can be used for calculation of average electronegativity and charge distribution in a molecule. The aim of this work is to perform the EEM parameterization using the Merz-Kollman-Singh (MK) charge distribution scheme obtained from B3LYP/6-31G* and HF/6-31G* calculations. To achieve this goal, we selected a set of 380 organic molecules from the Cambridge Structural Database (CSD) and used the methodology, which was recently successfully applied to EEM parameterization to calculate the HF/STO-3G Mulliken charges on large sets of molecules. In the case of B3LYP/6-31G* MK charges, we have improved the EEM parameters for already parameterized elements, specifically C, H, N, O, and F. Moreover, EEM parameters for S, Br, Cl, and Zn, which have not as yet been parameterized for this level of theory and basis set, we also developed. In the case of HF/6-31G* MK charges, we have developed the EEM parameters for C, H, N, O, S, Br, Cl, F, and Zn that have not been parameterized for this level of theory and basis set so far. The obtained EEM parameters were verified by a previously developed validation procedure and used for the charge calculation on a different set of 116 organic molecules from the CSD. The calculated EEM charges are in a very good agreement with the quantum mechanically obtained ab initio charges. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2009

Introduction

The electronegativity equalization method (EEM) was developed by Mortier and Coworkers1–4 as a semiempirical method based on the density-functional theory (DFT)5, 6, which can be used for the fast calculation of charge distribution in a molecule. However, because of the semiempirical character of this method, it is necessary to parameterize the EEM before using it for the atomic charge calculations.

The parameterization of the EEM is a time-consuming process, which consists of several steps. First, a sufficiently large set of appropriate molecules must be chosen, for which the ab initio atomic charges are calculated. These charges are then used during the developmental process of the EEM parameters Ai, Bi, and the adjusting factor κ.

Essentially, the EEM is parameterized for the specified level of theory (e.g., HF, B3LYP), basis set (e.g., STO-3G, 3-21G, 6-31G*) and charge calculation scheme [e.g., Mulliken population analysis (MPA), Merz-Kollman-Singh (MK), charges from electrostatic potentials using a grid based method (CHELPG)]7. Nowadays, the EEM is parameterized for different combinations of the theory level, basis set, and the scheme for charge calculation as one can see for example in Yang and Shen8, 9, Menegon et al.10 and Bultinck et al.11, 12 The EEM is also used in cooperation with other chemical techniques (e.g., Smirnov and van de Graaf13, Heidler et al.14, and others). In this article, the process of EEM parameters development was performed according to the methodology described in detail in ref.15, where the EEM parameterization methodology was validated on large sets of organic, organohalogene, and organometal molecules, which contained up to 6000 molecules. The validated methodology was afterward used to improve existing and calculate additional EEM parameters for HF/STO-3G MPA charge calculations.

One of the goals of this article was to show that the EEM parameterization methodology, which was successfully used for the calculation of HF/STO-3G MPA charges, can also be used for the calculation of HF/6-31G* and B3LYP/6-31G* MK charges. Once confirmed, another aim of this article is to calculate the EEM parameters for HF/6-31G* and B3LYP/6-31G* MK charges. The quality of all obtained EEM parameters has been carefully validated using a reference set of molecules.

Theoretical Basis

The EEM is derived from the DFT and it is based on three basic principles. First of them is the Sanderson's electronegativity equalization principle16, 17. This principle states that the electronegativities of the atoms forming the molecule are equal:

equation image(1)

where χi and χj are the individual effective atomic electronegativities and χ is the molecular electronegativity.

The second one is the principle of charge balance in a molecule:

equation image(2)

where qi is the atomic charge distributed on atom i.

The third principle is the definition of the atomic effective (charge-dependent) electronegativity1, 4 and this is the core principle on which the EEM is based on:

equation image(3)

χi is the effective atomic electronegativity of atom i in the molecule, qi and qj are the atomic charges distributed on atoms i and j, Rij is the separation distance between atoms i and j, Ai and Bi represent the atomic valence state electronegativity and the hardness of atom i, respectively. Parameter κ is an adjusting factor. Parameters Ai and Bi are defined by eqs. (4,5):

equation image(4)
equation image(5)

Where χmath image is the electronegativity of an isolated neutral atom i, and ηmath image is its hardness. Δχi and Δηi describe the corrections invoked by the change in size and shape of the atom in the molecule and the influence of the surrounding molecules.

EEM Parameterization

The aim of the EEM parameterization is to determine the parameters Ai, Bi, and κ for all specified atom types. To achieve this goal, it is necessary to know all the atomic charges qi for all atoms i, molecular electronegativity χ, and the distances between constituent atoms Rij. Afterward, the eq. (3) can be reformulated into the form:

equation image(6)

which is more suitable for the EEM parameterization using the least-square minimization method which is based on linear regression. Equation (6) can also be written in the following form:

equation image(7)

where xκ = qi and equation image. These two values of xκ and yκ are connected together with the properly selected κ value.

A brief description of the EEM parameterization process is as follows (for detailed description see ref.15):

First, a suitable set of molecules has to be chosen. Afterward, the atomic charges are calculated using a quantum mechanical approach and χ is taken as the harmonic mean of the neutral atoms which constitute the molecule.

In the next step, the entire set of molecules is divided into the sets of atoms, so that each set contains only atoms with the same atomic type and bond order.

Then, according to the eq. (7), a pair of xκ and yκ values is calculated for each atom in each set of atoms and for all values of κ. Finally, the best value of κ is chosen depending on the Rmath image value.

The Rmath image value is an average of Rmol values, which are calculated during the process of EEM parameterization for each molecule in the training set. The Rmol value is the R-squared value of the linear regression line, which is interposed among the points qi(ab initio), qi(EEM)], where qi(ab initio) and qi(EEM) represent ab initio and EEM charges of the atom i, respectively. The Rmol value is a number between 0 and 1. The closer the Rmath image value to 1 is, the better the results are.

Methods

Selected Sets of Molecules

For the parameterization of the EEM, molecules from the Cambridge Structural Database (CSD)18 were chosen. The CSD is governed by the Cambridge Crystallographic Data Centre (CCDC) and it is a repository of small molecule crystal structures that have been determined experimentally using X-ray and/or neutron diffraction.

Three sets of molecules were taken from the CSD. The first, called as the training set, was constructed from 380 molecules containing hydrogen, oxygen, carbon, nitrogen, and sulphur atoms, all elements which are present in proteins and also some other elements, such as bromine, chlorine, fluorine, and zinc. This set of molecules was used for parameterization independent from the second set of molecules, the validation set, which was used as a reference set for the validation of calculated parameters. The validation set contains 116 molecules with the same atom types (it means the same elements and bond orders) as the training set.

For the purpose of comparison between our EEM parameters based on the B3LYP/6-31G* MK calculations and EEM parameters in literature12, it was necessary to have another set of molecules, the comparative set, which only contained such molecules with elements and bond orders for which we had EEM parameters in both EEM parameter sets. Specifically, the comparative set contains 111 molecules of hydrogen, oxygen, carbon, nitrogen, and fluorine atoms.

Details about the constitution of all sets are presented in Table 1. The geometries of all molecules were stored in SDF format19.

Table 1. Details of the Sets of Molecules Used During EEM Parameterization and Validation
ElementBond orderNumber of molecules and atoms
Training setValidation setComparative set
MoleculesAtomsMoleculesAtomsMoleculesAtoms
Br156771924--
C13422670105752103776
Cl1581303381--
F13613412422178
H1368740111221591072192
N12094037413172122
O12667707619684245
S1541201634--
Zn141521517--
C2361424611413501071258
N213028247963874
O22575567215685182
 Total3801684111650381114927

Quantum Chemistry Calculations

For the parameterization of the EEM, quantum chemistry calculations were done to determine the partial charges on atoms. For the calculation of these atomic charges, the Merz-Kollman-Singh (MK) algorithm7, 20 was used, because of the fact, that MK charges are derived from the electrostatic potential, they are known to be much less basis dependent in comparison with the Mulliken charges.

For each set of molecules (training, validation, and comparative), two sets of charges were calculated: MK charges at the HF/6-31G* and B3LYP/6-31G* level of theory.

All quantum chemistry calculations were performed using the program GAUSSIAN 0321.

The EEM Validation Procedure

For performing the validation procedure of EEM charges obtained using EEM parameters shown in this article, program EEM_SOLVER22 was used. This software was developed in our group and it is freely available at http://ncbr.chemi.muni.cz/˜n19n/eem_abeem.

Results and Discussion

For all molecules in the training set described in Table 1, the HF/6-31G* and B3LYP/6-31G* charges calculated using Merz-Kollman-Singh (MK) charge distribution scheme were used to parameterize the EEM. During the EEM parameterization, parameters Ai, Bi, and κ were obtained and further optimized using the procedure described in ref.15.

The accuracy of obtained parameters is expressed by the Rmath image value which specifies the correlation between EEM and ab initio charges. The Rmath image value is a number between 0 and 1 and the closer this value is to 1, the better the results are.

Table 2 presents obtained EEM parameters Ai, Bi, and κ after the process of optimization for both B3LYP/6-31G* and HF/6-31G* MK charges.

Table 2. The EEM Parameters A, B, and κfor the Training Set with Statistical Significance Test and Standard Deviation s for All Regressions
Element and bond orderEEM parameters obtained using
B3LYP/6-31G* MK charges κ = 0.302HF/6-31G* MK charges κ = 0.227
ABsABs
  1. The statistical significance test is calculated for α = 0.05 which means that the numbers show the interval where the parameter should fit with the probability of 95%. All values are in Pauling units.

Br12.659 ± 0.0261.802 ±0.2280.0512.615 ± 0.0251.436 ±0.2340.060
C12.482 ± 0.0030.464 ±0.0110.0762.481 ±0.0030.373 ± 0.0090.075
Cl12.519 ±0.0131.450 ±0.2200.0872.517 ±0.0151.043 ±0.2150.098
F13.577 ± 0.2683.419 ± 0.9720.1723.991 ±0.3573.594 ± 0.9450.169
H12.385 ± 0.0030.737 ± 0.0170.0562.357 ± 0.0040.688 ± 0.0160.055
N12.595 ± 0.0220.468 ± 0.0390.0922.585 ± 0.0230.329 ±0.0310.093
O12.825 ± 0.0320.844 ± 0.0580.0952.870 ± 0.0350.717 ± 0.0510.094
S12.452 ± 0.0130.362 ± 0.0470.0882.450 ± 0.0140.269 ± 0.0350.090
Zn12.298 ± 0.0600.420 ± 0.0740.0632.185 ±0.0910.375 ± 0.0740.064
C22.464 ± 0.0020.392 ± 0.0070.0722.475 ± 0.0020.292 ± 0.0050.071
N22.556 ± 0.0090.377 ± 0.0180.0682.556 ± 0.0090.288 ± 0.0140.067
O22.789 ± 0.0640.834 ± 0.1300.0942.757 ± 0.0640.621 ± 0.1080.096
 Tmath image 0.941  0.940 

The quality of EEM parameters Ai, Bi, and κ was confirmed by the validation process. For the validation set, EEM charges were calculated using obtained EEM parameters shown in Table 2. Afterward, obtained EEM charges were compared with ab initio charges calculated for the validation set. The Rmath image values were taken as the comparison criterion. The Rmath image value has been calculated to be 0.925 for B3LYP/6-31G* MK charges and 0.926 for HF/6-31G* MK charges. From the presented Rmath image values, it is seen that the charges calculated using the EEM are in very good agreement with the quantum mechanically obtained ab initio charges.

To make sure that our data are correct, we decided to calculate the absolute differences between the EEM and ab initio charges on the training set. For this purpose, all atoms in the training set were separated into smaller subsets according to the atom type. For each atom in each atom type subset, the absolute difference between the EEM and ab initio charges was recorded and plotted into the histograms.

From the histograms, it is seen that EEM charges obtained using EEM parameters derived from B3LYP/6-31G* and HF/6-31G* MK charges are similar. A more detailed view shows higher accuracy of the EEM charges based on the B3LYP/6-31G* MK calculations.

The best agreement between the EEM and ab initio charges is observed for hydrogens (see fig. 1a). This is due to the fact that more than one half of the training set was formed by the hydrogen atoms. The higher the number of atoms in the set the more accurate the results are.

Figure 1.

Histograms of the absolute difference between EEM and ab initio charges (both B3LYP/6-31G* and HF/6-31G* MK), (a) for atom type H with a bond order of 1 and (b) for atom type Zn with a bond order of 1. The difference is depicted on the x-axis, whereas the y-axis shows number of cases that fall into the specified x-axis interval.

The worst agreement between the EEM and ab initio charges is observed for nitrogen atoms with a bond order of 1 (see Supp. Info.). This is due to the fact that some of the nitrogens, which were labeled as nitrogens with a bond order of 1, are in fact conjugated.

The histogram in Figure 1b shows that the agreement between the EEM and ab initio charges for zinc atoms with a bond order of 1 is also very good, even if zinc is a metal that may exhibit significant charge transfer.

Histograms obtained for other atom types are shown in the Supporting Information.

Another interesting feature is the R(EEM)math image value which shows the correlation between EEM charges received from HF/6-31G* and B3LYP/6-31G* MK charges. The R(EEM)math image value is a number between 0 and 1 and the closer this value is to 1, the better the results are. For the training set, the R(EEM)math image value is equal to 0.991.

The accuracy of our EEM parameters based on the B3LYP/6-31G* MK calculations was compared with the EEM parameters published by Bultinck et al. in ref.12. For this purpose, the Rmath image value was set as the comparison criterion. Because of the fact, that the κ value is not specified in ref.12, it was necessary to find it out using our methodology (for detailed description see ref.15). After κ was found, we calculated two sets of EEM charges and determined their Rmath image values. The first set of EEM charges was calculated for the comparative set of molecules using the EEM parameters from literature and the second one for the comparative set again, but using our EEM parameters. The obtained Rmath image values for the first and second set were 0.940 and 0.956, respectively. This shows better correlation for our parameters compared with those from the literature.

Conclusions

In this work, we have parameterized the EEM for B3LYP/6-31G* and HF/6-31G* Merz-Kolmann-Singh charge distribution scheme to receive the EEM parameters for hydrogen, carbon, oxygen, nitrogen, sulphur, fluorine, chlorine, bromine and zinc atoms.

As a training set, we used organic molecules with experimentally determined structure from the database of crystallographic structures CSD. The calibration process of the EEM parameters was carried out according to methodology which was precisely validated on large sets of organic, organohalogene, and organometal molecules. We have proven that the EEM parameterization methodology, which was successful on HF/STO-3G MPA charges, can also be successfully applied to HF/6-31G* and to B3LYP/6-31G* MK charges.

To receive the best results, obtained EEM parameters were carefully verified by a validation procedure on different set of molecules. During the validation process, the absolute differences between the EEM and ab initio charges were analyzed. According to these characteristics, a set of histograms was created. Our EEM parameters were also compared with the EEM parameters published in the literature and it was shown that our EEM parameters provide more accurate results.

From the results presented in this article, it is seen that EEM charges obtained using our EEM parameters are in very good agreement with ab initio charges as well as more accurate than parameters described in literature (in case of B3LYP/6-31G* MK charges) or results which have not been published so far (in case of HF/6-31G* MK charges).

The EEM parameters published as a part of this article can be directly used for charge calculations using program EEM_SOLVER22, which is freely available on the web page http://ncbr.chemi.muni.cz/˜n19n/eem_abeem.

Acknowledgements

The authors thank the Supercomputing Centre in Brno, Czech Republic, for providing access to computer facilities.

Ancillary