C library for topological study of the electronic charge density

Authors


Abstract

The topological study of the electronic charge density is useful to obtain information about the kinds of bonds (ionic or covalent) and the atom charges on a molecule or crystal. For this study, it is necessary to calculate, at every space point, the electronic density and its electronic density derivatives values up to second order. In this work, a grid-based method for these calculations is described. The library, implemented for three dimensions, is based on a multidimensional Lagrange interpolation in a regular grid; by differentiating the resulting polynomial, the gradient vector, the Hessian matrix and the Laplacian formulas were obtained for every space point. More complex functions such as the Newton–Raphson method (to find the critical points, where the gradient is null) and the Cash–Karp Runge–Kutta method (used to make the gradient paths) were programmed. As in some crystals, the unit cell has angles different from 90°, the described library includes linear transformations to correct the gradient and Hessian when the grid is distorted (inclined). Functions were also developed to handle grid containing files (grd from DMol® program, CUBE from Gaussian® program and CHGCAR from VASP® program). Each one of these files contains the data for a molecular or crystal electronic property (such as charge density, spin density, electrostatic potential, and others) in a three-dimensional (3D) grid. The library can be adapted to make the topological study in any regular 3D grid by modifying the code of these functions. © 2012 Wiley Periodicals, Inc.

Introduction

The “quantum theory of atoms in molecules” (QTAIM) developed by Bader et al.[1–8] is very useful to obtain the chemical information from the charge density. QTAIM is a firm, rigorous, and quantum mechanically well-defined theory based on observables such as the electron density or energy density fields. Most modern theories of bonding are based, in one way or another, on the partition of charge (or electronic density) among the different nuclear centers under study, usually according to Mulliken, that is, projected density of states in solid-analysis. In this way, an important amount of the interpretative models of chemical behavior are based on concepts that are known to be poorly defined and giving answers extremely dependent on a whole hierarchy of approximations.[6] QTAIM provides a quantitative link between the total electron density (regardless of how it was generated: calculated or experimental) and important physical properties of a molecule, bypassing the wave function in the analysis. In contrast, QTAIM is a methodology independent from the orbital concept. In particular, it provides a rigorous definition of chemical bond and geometrical structure for all types of molecules and solids and it has proven to be useful in the analysis of physical properties of insulators, pure metals, and alloys.[4–6] High-quality experimental densities of minerals,[9, 10] covalent,[11] metallic,[12] and molecular crystals[13, 14] have been analyzed in terms of QTAIM concepts. Furthermore, QTAIM-calculations on simple metals,[15] alloys, and intermetallic phases[16, 17] have also been reported as well as on molecular,[18, 19] covalent, and ionic crystals.[6, 11] Software packages that calculate and draw gradient path, laplacian, and density isolines has been published,[20–28] using similar computational methods; however, only few of them are open source code (Aimpac,[25] Multiwfn,[26] DGrid,[27] and Critic[28]).

The topological study consists in determining and characterizing critical points, bond paths, zero flux surfaces, gradient maps, and atomic basins. The critical points are ones where the gradient is null (among these points exist local minima and maxima, and saddle points) and are fundamental to the topological study. The local maxima usually correspond to the atomic nuclei positions, the local minima are cage critical points; and there are two kinds of saddle points, minimum in one direction and maximum in the perpendicular plane (bond critical points), and maximum in one direction and minimum in the perpendicular plane (ring critical points). A bond and interatomic surface representations are obtained by drawing gradient paths close to a bond critical point. To find these points, it is necessary to obtain the gradient vector, g(r) and the Hessian matrix, H(r) in any space point r (using the Newton–Raphson method, later described).

equation image(1)

To replace gradient and Hessian calculation routines in software packages for finding critical points and their properties, the library began to build at early 2000s using Fortran 77. For developing our own programs comprehensively, later on, the library was translated to C language. This C library content was early reported,[29] not in detail, but including some calculation examples. Also, it has been used as a tool in studies of ours group.[30] In this work, we describe exhaustively the C library, providing the source code, method of use and some additional functions for reading grids in different formats.

The library contains functions to calculate the gradient vector and the Hessian matrix, and other functions useful for the topological study.

Approximation to the Function and Derivatives

Starting with the Lagrange polynomial approximation formula,[31, 32] eq. (2):

equation image(2)

where n is the number of points (xj, fj) used in the interpolation and P(x) is a polynomial of degree n−1 that passes through the n points. If the xj values are equally spaced[29] (Fig. 1), then xj = x0 + j h, where h is the distance among the points.

Figure 1.

The Lagrange polynomial passes through the points (black dots). When the polynomial is evaluated in another point (x), an approximate value (f) of the function is obtained. When possible, in a piecewise interpolation, it is convenient to choose the points array such that the x value is inside the central interval.[29]

Defining s = (xxα)/h, with a constant α index (it is convenient to choose α such that xα and xα+1 are the central points of the array: n is even and α = n/2−1), solving for x (x = xα + s·h) and substituting in the Lagrange polynomial, eq. (2):

equation image(3)

where

equation image(4)

wk,n(s) is a n−1 degree polynomial in s.

To obtain the approximations of the function derivatives in x, the wk,n(s) is differentiated with respect to s, as shown:

equation image(5)

For the higher order derivatives,

equation image(6)

where

equation image(7)

In the interpolation and approximation of the derivatives (in the developed library), the used polynomials equation image were calculated for four points (n = 4). See Table 1.

Table 1.  equation image polynomials defined in eqs. (7) and (4), with n = 4 y α = 1, and equation image.
 k = 0k = 1k = 2k = 3
v = 0(–s3 + 3s2 – 2s)/6(s3 – 2s2s + 2)/2(–s3 + s2 + 2s)/2(s3s)/6
v = 1(–3s2 + 6s – 2)/6(3s2 – 4s – 1)/2(–3s2 + 2s + 2)/2(3s2 – 1)/6
v = 2s + 13s – 2–3s + 1s

Expressing the Lagrange polynomial as a sum of products of fk and the weight wk,n(s), eq. (2), permits to obtain the derivatives in a straightforward way, eq. (5). Also, calculating the tensor product[31] of the weights wk,n(s), an approximation for a multidimensional regular array can be figured out. For a three-dimensional (3D) grid, for each dimension, the value of si is calculated (s1 = (xxα)/h1, s2 = (yyα)/h2), s3 = (zzα)/h3), then the wmath image(si) polynomials are evaluated. These polynomials multiply the function values for each grid point (fkmath image):

equation image(7)

where n1, n2, and n3 are the numbers of points used in the interpolation along each dimension.

Differentiating the resulting polynomial with respect to si, the approximation to any of the derivatives, with respect to x, y, z variables, is obtained.

equation image(8)

Equation (8) is equivalent to eq. (7), when v1 = v2 = v3 = 0, understanding that equation image.

The developed library principally consist in routines that, given the function values in a 3D regular grid, calculate the approximate value of the function, gradient vector, and Hessian matrix, in any space point inside the grid limits. These routines are implementations of eq. (8) with n1 = n2 = n3 = 4 (tricubic Lagrange interpolation). Different from trilinear interpolation,[33] this allows for approximations of second-order derivatives (see Table 1).

Inclined grid

The “regular” term for the grid, means that the space between the points (hi) is constant along any particular dimension; however, it can be different along the other dimension. Moreover, the grid can be inclined (Fig. 2). This means that, at least one of the angles among the axes of the dimensions is different from 90° (this is usual for the unit cell of many crystalline solids). In this case, we always take one axis parallel to the x axis and another parallel to the xy plane.

Figure 2.

a) No inclined grid, o is situated in the origin, a in the x axis, b in the y axis, and c in the z axis. b) Inclined grid, o is situated in the origin, a in the x axis, b in the xy plane and c could be in any place.

To calculate the evaluated gradient in a space point r in an inclined grid, it is necessary to perform two linear transformations,[34] first, the point coordinates in the inclined coordinated system (r') must be calculated, Ar = r′, where A is the matrix that transforms the ordinary (no inclined) coordinates to inclined coordinates. Then, calculating the derivatives, using eq. (8), the gradient vector (g′) is determined. Finally, the gradient is transformed back to the ordinary coordinates, multiplying by the transpose of A, g = ATg′.

To calculate the Hessian matrix (H), it is also necessary to transform the r point to the inclined coordinates system. Once H′ is calculated (using eq. (8) to get the second-order derivatives), H′ is finally transformed back to the ordinary coordinates: H = ATHA.

These equations are quite general, when a linear transformation exists from a coordinated system to another, however, in the developed library; the only case considered is when the inclined coordinated system satisfies the given conditions (Fig. 2). In this case, the transform matrix is always an upper triangular matrix.

Approximation to the function logarithm

The electronic charge density has particular characteristics, for example, its value is never negative, and it has the exponential behavior close to the atomic nuclei. The great increment of the density value causes that the interpolated polynomial oscillates producing negative values in the neighborhood of the nuclei (see Fig. 3b). By changing the grid value (fkmath image) for its logarithm in the previously described interpolation, eq. (7), and finally taking the antilogarithm of the interpolated value (P), it is possible to avoid the oscillation condition.

Figure 3.

Electronic charge density plots in a plane that contains the sulfur atom of the dibenzotiophene molecule. The plane contains 9 × 9 grid points; the distance between points is 5 × 10−12 m. a) Without interpolation. b) Using polynomial interpolation. c) Using logarithm interpolation.

In the derivative cases, eqs. (9) and (10) are used.

equation image(9)
equation image(10)

where u and v are x, y, or z, and p is the interpolation of the logarithm. The ∂p/∂u, ∂p/∂v, and ∂2p/∂uv values are the first-order and second-order derivative approximations obtained by the method when the logarithms of the grid values are used.

Methods Requiring the Gradient Vector and the Hessian Matrix

The following methods are used in the topological study and require the gradient vector and the Hessian matrix.

Newton–Raphson method

Due to the fact that the evaluated function gradient at a critical point is equal to the null vector (0), the way to calculate the point coordinates is to solve the equation ▿ρ(rcrit) = 0. An alternative way to solve this equation is to use the Newton–Raphson method.[33] The multidimensional scalar function ▿ρ(r) evaluated at the point r = ri + h is expanded in Taylor series:

equation image(9)

where Hi is the Hessian matrix H(r) (the Jacobian of ▿ρ (r)) valuated at ri.

By neglecting the higher order terms in the Taylor series, eq. (9), setting ▿ρ (r) = 0 and solving for h:

equation image(10)

we get the shift vector (h).

If the function ρ(r) is quadratic then the h vector starts in the ri point and ends in the critical point. In general, the function is not quadratic so a new ri+1 point is always calculated using:

equation image(11)

where t is a small value, lower than 1.

The calculation according to eqs. (10) and (11) is iterated until ▿ρ(r) is equal to the vector 0 or has a small norm. Then, ri is a critical point or one very close to it.

In the developed algorithm, the norm of h has an upper bound. When |h| is greater than the bound, t is calculated so that |t h| is equal to the bound (in other case t = 1). For each iteration, the bound is decreased by means of a geometric progression, in which the ratio must be lower than 1. Doing this, the large oscillation near to the critical points is avoided, when the gradient has a high norm value in the critical point neighborhood. The maximum number of iterations and the path length are defined by the user.

Fifth-order Cash–Karp Runge–Kutta Method

The gradient path necessary for the molecular graph and interatomic surface construction are solutions of the differential eq. (12).[1]

equation image(12)

Equation (12) solution is a parametric curve in R3 which is unique when an initial r value is given. A numerical solution is obtained with the fifth-order Cash–Karp Runge–Kutta method [33]. The general form of this method is

equation image(13)

where rn = (xn, yn, zn) and the kj values with a little stepsize h are:

equation image

The particular values of the various constants (cj, bij) can be found in the “Numerical Recipes.”[33] Choosing the stepsize h as an adequate little value, the set of rn points is adjusted closely enough to the gradient path.

Derivative Discontinuities

The derivative discontinuities of piecewise Lagrange polynomials do not allow finding some critical points located on the faces of cube grid. Conversely, the spline interpolation assures the derivative continuities, but it requires an array to store the second derivatives and solving several equation systems involving all grid data.[33] To save calculation time and memory storage, we prefer to use the Lagrange interpolation, instead of spline method and solving the discontinuity problem as explained later.

Four grid points are always taken along each dimension (4 × 4 × 4 for a 3D grid); so that the point, in which the derivative will be calculated, as far as possible, must be inside the central interval (Fig. 4). This causes a change of interpolation polynomial when leaving the central interval. In general, this discontinuity does not represent a serious disadvantage, nevertheless, the Newton–Raphson method can fail when the critical point is exactly on the boundary planes of intervals [with a value of the si equal to zero, eqs. (7) and (8)], in which case it is possible that the gradient norm always be greater than the selected convergence criteria.

Figure 4.

Change of 2D interpolation polynomial. In any point within in the dark gray zone, the values fkmath image used by the interpolation polynomial, eq. (8), correspond to the 16 black dots in the grid. Calculating the gradient or the Hessian matrix in several points starting from the open circle in a) and ending at the open circle in b), the interpolation polynomial is steeply changed when the point reaches the dark gray zone in b).

In the developed library, an option exists that allows continuing with the interpolation polynomial previously used, when the point moves out (not more than 10%) of the interval length in each dimension. This permits finding the critical points that lie in the boundary planes of intervals (in fact, finds a point quite near), without sacrificing the convergence criterion that affects the position of the other critical points.

Library Description

The grid is stored in a structure (of C language), denominated _GRD. Functions that read grid files and return a pointer to a _GRD structure with all the information of the grid (the necessary memory space for this structure is dynamically allocated in the functions) are available. The lagrange3D4grd.c file contains the read_grd function that reads DMol3 grd files;[35] further description for this function and the _GRD structure are in lagrange3D4grd.h file. GaussianCube.c file contains functions to read grid data in CUBE format from Gaussian[36] and Gamess[37] output files and CHGCARfile.c file the function to read a grid data from Vasp[38] output files.

The C function argument is a character string (char * filename) that must contain the name of a *.grd file.

When the _grd_latency global variable has values different from zero it prevents the change of the interpolated polynomial at the interval edges, according to the explanation in Derivative Discontinuities Section.

Functions to calculate the interpolated value, the gradient and Hessian matrix at any space point are described in lagrange3D4grd.h file; also functions to find critical point based on the Newton–Raphson method (described in the Newton–Raphson method Section) and to calculate the points at the gradient path according to the fifth-order Cash–Karp Runge–Kutta method (described in the Fifth-order Cash–Karp Runge–Kutta Method Section).

The library code files can be obtained from the Website: http://alfa.facyt.uc.edu.ve/quimicomp/

Some Examples

The examples shown in Figure 5 are screenshots of graphic windows of some programs that use the described library. The programs were built for Windows XP with MinGW (MinGW is a collection of freely available programming tools, specific header files and import libraries that allow one to produce native Windows® programs Website: http://www.mingw.org/) C compiler and the Dev-C++ (Dev-C++ is a development environment for the C/C++ programming language (for Windows). Free software including a MinGW version Website: http://www.bloodshed.net/devcpp.html) development environment, using OpenGL, Glut, and Glui libraries (Glut and Glui libraries to install it in Dev-C++ can be obtained from the Website: http://www.nigels.com/glt/devpak/).

Figure 5.

Library application examples: a) Image of the electronic charge density in the plane containing all atoms of a p-nitroaniline molecule. b) Image of the minus Laplacian of the electronic charge density in the plane containing one molybdenum atom and two sulfur atoms of the MoS2 crystal. c) Bond paths (black lines), and critical points (small gray spheres: bond critical points; blue: ring critical points) of the tetracycline molecule. d) Some bond paths and critical points of the MoS2 crystal.

As an interpolation example, Figure 5a shows the electronic charge density plot on the plane containing all atoms of p-nitroaniline. The calculation was performed using the software Gamess-US[37] at the HF/6-31G(2p,2d) level and the generated grid spacing was 0.1 Bohr. As an example of derivatives calculation with the interpolation of the logarithm, Figure 5b is the graph of the Laplacian of electronic charge density of crystalline MoS2 on a plane containing one Mo and two S atoms, calculated by density functional theory (DFT) with Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional, and double numerical plus polarization (DNP) basis set using the software DMol3.[35] Figure 5c is an example of critical points calculation and gradient path for tetracycline (Newton–Raphson and Cash–Karp Runge–Kutta methods, respectively) calculated using the software Gamess-US[37] at the HF/6-31G(2p,2d) level; the grid spacing was 0.05 bohr. In Figure 5d, some bond paths and critical points of the MoS2 crystal (inclined unit cell) are shown.

For tetracycline, 65 critical points (different from nuclear positions) were determined with a program using this library and also with an analytical electron density method (Multiwfn[26]). The maximal distance, between the calculated critical point position with this method and the Multiwfn program, was 0.0013 Bohr, equivalent to 2.6% of the grid spacing; and 58 of the 65 points have a distance less than 0.0005 Bohr. These differences can be diminished by refining the grid.

Acknowledgements

The authors express thanks to Professor Oscar Valbuena for his help in preparing this manuscript.