Promising insights in parallel grid-based algorithms for quantum chemical topology

Some straightforward improvements designed to make grid-based quantum chemical topology more efficient and faster are presented. The strategy focuses on both the evaluation of the scalar function over three-dimensional discrete grids and the algorithms aimed to follow and integrate gradient trajectories over the basin volumes together. Beyond the density analysis, we show that the scheme is quite suitable for the electron localization function and its complex topology. With this speed-up of the parallelized process used to generate 3d-grids, this new scheme is several orders of magnitude faster than the original grid-based method implemented in our laboratory (TopMod09). The efficiency of our implementation (TopChem2) was also compared with well-known grid-based algorithms designed to assign the grid points to basins. The performances, speed versus accuracy have been discussed on the basis of results obtained from selected illustrative examples.


| INTRODUCTION
The quantum chemical topology (QCT) has gained popularity as a bonding analysis method in molecules or solids since the seminal papers published about the quantum theory of atoms-in-molecules (QTAIM) and the topological analysis of the electron localization function (ELF). [1][2][3][4][5] In a QCT analysis, a partitioning of the molecular space into volumes (so-called basins) is carried out according to the theory of dynamical gradient systems through an assignment of a scalar function of the molecular space. More precisely, the process splits the space into regions associated to maxima of the function by studying the gradient trajectories (or integral lines). Note that the term "topology" is used in its broad sense related to the existence of the stationary points of the scalar field and their connectivity insured by the gradient trajectories. The most important QCT algorithm is the one that performs this assignment to basins localized around the maxima of the considered scalar function. Beforehand, in a grid-based approach, the function values have to be computed at each grid point.
The basin volumes are separated by surfaces (so-called separatrices) according to an algorithm typically combining an analytical and a numerical methodology. In the QTAIM framework, the function considered is the one electron density and these basins are associated with each of the interacting atoms in the molecule. In contrast, the ELF topology exhibits non-atomic valence basins (bonding and nonbonding regions) in addition to core basins surrounding nuclei with atomic number Z > 2. The valence basins are characterized by the number of core basins with which they share a common boundary.
This number is called the synaptic order 6 and is usually defined as following: monosynaptic basins (labeled V(A)) usually correspond to lone pair regions, while disynaptic and polysynaptic basins (labeled V(A, B, C, …)) characterize the covalent bonds.
Since the pioneer works of Bader and its coworkers until today, large efforts have been made to propound a large variety of efficient algorithms designed to the QTAIM analysis whether for molecular or condensed-matter systems. [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21] Some works using an appropriate vectorization and parallelization 21 have been proposed, even for a graphics procession unit implementation. 22 The X-ray crystallography community has also proposed some innovative methods. 23,24 Although it is not always feasible to apply grid-based algorithms to large systems where a very large computational cost can be required, the grid-based methods were employed from the early days. 9,13,15,17,[25][26][27] The key of such strategy is to prevent some numerical difficulties during the grid points assignment with an easy implementation. Thus, the grid-based methods have sparked a renewed interest for uniform as well as for non-uniform grids. 8,11,21,28 While much effort has been devoted to QTAIM developments, a rather small number of works were devoted to the ELF topological analysis which is more complex than QTAIM since typically nonatomic basins need to be characterized and possible degenerated attractors can be also found. 15,[27][28][29] In this article, we focus on an improved implementation of the off-grid-based algorithm which historically is the choice of both the QTAIM and ELF implementation in the TopMod09 package. 27 Our choice is to highlight the topological analysis of the ELF function because of its complex topology. We have especially focused the improvements of the methodology on the two following points: 1. Efficiency to make 3d-grid in order to decrease the computational effort without compromising the accuracy of the integrations processing.
2. Vectorization and parallelization of algorithms using OpenMP technology in order to follow the steepest ascent gradient paths of the scalar function.
The diagram flows of used grid-based algorithms and the new developments are presented in Sections 3 and 4. The overall organization of our programs is built in the same spirit as the TopMod09 package. 27 The QCT grid method is organized according to the three following steps: 1. Evaluation of the density function over a rectangular parallelepiped uniform grid of dimension N = N x Â N y Â N z where each grid point has a constant spacing with its neighbors; 2. Assignment of each grid point to a basin; 3. Integration of the electron charge density over the basins, and calculation of related integrated properties, the most commonly being the basin populations.
The last section contains some comparative tests speed versus accuracy of our implementation with well-known algorithms used for the QTAIM analysis, the Bader Charge near-grid algorithm of the Henkelman's group 13,25,26 and the Keith's promega algorithm implemented in the AIMALL package 12,30 applied to two selected molecular systems. Note that the near-grid algorithm was implemented in the popular Multiwfn. 31 The Yu and Trinkle weighted-grid algorithm 9 was also implemented in the of Henkelman's group for Bader charge analyses. These latter programs are typical examples of well-maintained codes which perform the QCT analysis from Gaussian (or slater) molecular wavefunction data. These data are gathered in a wfn file or a grid cube file format. The wfn format, initially introduced in the Bader's program AIMPAC, is currently supported by numerous quantum chemistry software, such as Gaussian, GAMESS-US, Q-Chem, or NWChem.

| COMPUTATIONAL DETAILS
Geometry optimizations have been performed at the hybrid density functional B3LYP level with the Gaussian09 software. 32 The standard all-electron basis set 6-31+G(d,p) was employed for all atoms. To illustrate the applicability of our implementation, we present a set of results using a standard uniform 3d-grid for both electron density and ELF. A grid step varying between 0.075 and 0.1 bohr was used. This range is known as a standard quality step and already gives fine grids, tailored to semi-numerical based partitions algorithms. The grid-based corrected algorithms have been implemented within an OpenMP technology parallelization scheme where a shared-memory parallel model is used. The grid-based parallelization scheme and its implementation used in this work is similar to that described by Rodriguez et al. 8 Briefly, it relies on the fact that the assignment for a set of grid points remains independent of the assignment of any other set. Thus, sets of grid points to be automatically assigned to a basin are distributed to several processors using the OpenMP technology. nal short loops which has enabled to decrease the CPU intensive parts. 27 Another improvement was to parallelize the loops within an OpenMP or MPI environments. However, the computational effort remains rather expensive and strongly depends to the total number of primitives M. Prior to a detailed examination of our method, we first recall how the electron density and ELF are typically related to M and are computed at a given grid point r.
Using the variational principle under the only constraints of the single or double occupation for each spatial orbital φ i (r) (Hartree-Fock or Kohn-Sham formalism), we recall that the electron density is computed at each grid point as: where φ i (r) are expanded using M atom-centered basis functions, χ μ (r) (GTO or STO), and P μν are elements of the total one-electron density matrix defined as follows: n i is the occupation of the φ i (r) orbital and c iμ and c iν are the real expansion coefficients.
ELF was expressed in the framework of the density functional theory by Savin et al. and rationalized in terms of the local excess kinetic energy due to the Pauli repulsion. ELF can be computed at each grid point as follows: Where, is the kinetic energy density and T w r ð Þ ¼ 1 8 jrρ r ð Þj 2 ρ r ð Þ is the von Weizsäcker kinetic energy density. Note that from a rigorous point of view, this expression of ELF is only valid for closed-shell systems described by a single Slater determinant.
The quantum chemistry programs often employ Cartesian Gaussian atomic orbitals in the algorithms for computational reasons. 33 The Gaussian orbitals are typically provided as a set of unnormalized Cartesian primitives, χ μ : with (x, y, z) is the relative Cartesian position of the electron to the atom-centered origin of the GTO tesian GTO χ N μ can be easily defined from its unnormalized form as: Now, we look for the minimal radius r μ of a sphere centered at the R μ location that verifies: Where ε needs to be as small as possible. Practically, a value of 0.001 has been used in this work. Note that this minimal radius r μ is similar to the β-sphere concept introduced in the QTAIM framework as a spherical volume centered on an attractor that is enclosed within its basin volume. 34 As shown by both Equations (1) and (2), the computational effort depends of the number of primitives locally involved at a given grid point for the computation of density functions ρ r ð Þor η r ð Þ: We notice that the contribution of a given primitive varies according to the distance between the grid point and the location of its center. For distances larger than r μ , the contribution of the primitive becomes negligible and can be smoothly neglected from the calculation of the one-density functions. The efficiency and the robustness of this simple method is briefly illustrated in Table 1. Table 1 shows that for most grid points a large number of primitives can be disregarded in order to compute the density functions. For example, for the benzene molecule we found that a maximum of only 50 primitives are needed at 80% of grid points. Some tests, speed versus accuracy, of our implementation are discussed below in the Section 4.

|
The numerical grid-based strategy for the basin analysis

| The improved off-grid algorithm of the Silvi's group
Within QCT, basins with any shapes need to be determined in the molecular space. In the grid-based methods, the integration of trajectories is carried-out without any explicit build of the zero-flux surfaces (separatrices). For the assignment of a given grid point, the algorithm looks along the steepest gradient paths leading to the corresponding attractor. Interestingly, the process can be easily vectorized and parallelized. 8 In this paper, we use a parallelized and revisited version of the historically grid-based Silvi's algorithm initially implemented in the TopMod09 package. 27 This algorithm will termed as off-grid algorithm throughout the paper.
The algorithm starts with an initial part designed to strongly reduce the length of the followed trajectories and also the number of Thereafter, we consider the set of all neighboring grid points (ii, jj, kk) from a given grid point (i, j, k) having a value of the density or ELF higher than the first one at (i, j, k), the ascent path will necessarily go through one or near one of its neighboring grid points. Therefore, if all these neighbors are already assigned to the same maximum, then we are assured that the point (i, j, k) can be directly assigned without considering any gradient ascent process, focusing the computational effort on grid points close to the basin boundaries. This improvement is very robust and rather fast because it avoids many gradient calculations and only depends upon the size of the grid. The overall scheme including the gradient ascent part can be summarized in the flow displayed on Algorithm 2, which is the one implemented.
3.2.2 | The on-grid and the corrected near-grid algorithms of the Henkelmam's group A relevant grid-based method was proposed by Henkelman and its coworkers. 13,25 In this approach, the ascent on-grid projected gradient T A B L E 1 Percentage of grid points where the number of contributing primitives is lower than the total number of primitives. A L G O R I T H M 1 Flow diagram of the off-grid gradient ascent part: ascent rho silvi (i 0 , j 0 , k 0 , attractor code). Example for quantum theory of atoms-in-molecules (QTAIM).
works by integrating gradient-ascent trajectories from every grid point. Thereafter, Tang et al. 26 have improved the original algorithm with a near-grid method which reduces the on-grid lattice bias by propagating a correction vector between the off-grid trajectory and the on-grid trajectory. The process finds all basins with a very short computational time which scales approximately linearly with the number of grid points (system size). It was implemented in several packages owing to its ease of implementation and relatively good performances. 15,31 However, this algorithm depends quite strongly on the grid spacing and it requires, in principle, a finely sampled grid. This could limit its application to large systems, except with our acceleration process described above that enabled such applications. Note that the Yu and Trinkle algorithm 9 was implemented in the code Bader charge analysis. 25 The method introduced a subgrid accuracy in the computation of QTAIM volumes. They implemented a weighting scheme to handle the fraction of an elementary grid volume to be connected with a neighboring grid point. The Henkelman and Tang algorithms will be respectively termed as on-grid and near-grid throughout the paper.

| Mixing the initial part of the Silvi's algorithm and the near-grid algorithm of the Henkelman's group
The near-grid algorithm of Tang 26 is very fast owing to the use of ongrid projected gradients. However, the process typically requires a fine grid spacing for a full relevant connection between near grid points. While noting that the initial part of the Silvi's algorithm is very efficient and fast, we can suggest to combine this latter process with the near-grid algorithm namely we consider that the wandering points can be firstly attributed without considering any gradient path, focusing the computational effort on border grid points close to the basin boundaries. This algorithm will be termed as the mixed-near-grid algorithm throughout our paper. Practically, this mixed-near-grid lead us to only replace in Scheme 2, the line "ascent rho silvi (i, j, k, attractor code)" by "ascent near-grid (i, j, k, attractor code). spacing of 0.075 bohr has also been selected as reference for a fine grid spacing. Through the population analysis, we have studied how the algorithms depend upon the grid spacing. level of theory. In this case, the radius r μ was compelled to be lower than 7.5 bohr. Color Code: Green diamonds: using grid acceleration process; red squares: without grid acceleration process. Blue triangle: TopMod09 program (single core). The used grid step is 0.1 bohr.

| The 3d-grid acceleration process
F I G U R E 2 Total time, in s, elapsed for the 3d-grid computing of the electron density (left) and ELF (right) applied to the benzene molecule computed at the B3LYP/6-31+G(d,p) level of theory. Color code: Green diamonds: using grid acceleration process; red squares: without grid acceleration process. Purple triangles: Multiwfn program. Single blue triangle: TopMod09 program (single core). The used grid step is 0.1 bohr.
F I G U R E 4 Electron localization function (ELF) profiles for the C 2 H 2 molecule computed along the z axis (bohr) at the B3LYP/6-31+G(d,p) level of theory. Color code: red, using grid acceleration process and black, without the accelerated process.
using a single core (red curve) whereas this latter calculation only requires 30 s using the acceleration process combined with 16 CPU cores. Our accelerated process can be also compared with other similar strategies designed to decrease the computational cost, for example, the method implemented in the Multiwfn software that uses an arbitrary cutoff in the exponents of primitives functions. 31 From Figure 2, we can see that the acceleration process is a least as effective as Multiwfn for both QTAIM and ELF and be even a little better for ELF when a small number of CPU cores is used. Although Top-Mod09 is only implemented for a single core calculation (the blue triangle on Figures 2 and 3), its implementation remains efficient since for a given grid point, the scalar function is either computed, if the electron density at nearest grid neighbors remains larger than an arbitrary threshold, or overlooked otherwise.
Although the calculation of the 3d-grid remains a rate-limiting step of a QCT analysis, the reduction of the computational cost obtained with our acceleration process makes this possible for applications to rather large systems where large grids need to be produced. Figure 4 shows the computed ELF profiles for C 2 H 2 along its molecular axis when all primitives are kept (black curve) and when some primitives are disregarded (red curve) following our acceleration process. The two profiles being fully overlapped, this clearly illustrates the sustainability of the accuracy of the acceleration scheme.

| The basin analysis
To illustrate the applicability of our analysis, we have evaluated the performance of the basin analysis using the grid-based algorithms discussed in this paper. Figure 5 reports the comparative parallel speeds of the assignment process as a function of the number of CPU-cores plotted for the three distinct algorithms namely the off-grid, the near-grid and the mixed-near-grid. For this example, the selected ELF grid is the one already used in the previous section for the CG complex where a constant grid spacing of 0.1 bohr was used. The goal of this study is to compare their relative speedup efficiencies using the grid acceleration process which has been used for make the 3d-grid. Figure 5 shows a clear split between the off-grid algorithm and the other two algorithms near-grid and mixed-near-grid. This is expected as the gradient ascent part is the rate-limiting step. Nevertheless, when the number of processors increases, the speed of the off-grid algorithm becomes closer to others. On one core, the mixed-near-grid algorithm is 82 times faster than the off-grid algorithm while it is only 13 times faster on 16 cores. This is an important finding which makes possible and efficient the off-grid basin analysis. Figure 6 displays the speedup quantity defined as the ratio of the computational time associated to a single processor T 1p (serial execution) and the time T Np associated to N p processors. 36 As observed in Figure 6, the speedup scales almost linearly with the number or CPU cores from 1 core to 30 cores.
Clearly, the reduction of the computational cost due to the twofold effectiveness of both the accelerated grid process and the Note: The is given in Figure 1.

| Numerical integrated properties
To illustrate the applicability of our implementation and its accuracy, we discuss in this section the basins populations after the basin analysis for each of algorithm considered in this paper. The accuracy of the integrated populations is of crucial importance because it is often the preliminary step to compute more elaborate QCT descriptor such as the delocalization index, 37 or more generally the variance and covariance matrix elements 38 which explicitly depend on the populations. Note that a low computational effort is needed in this work for the calculation of populations since the numerical integration of the electron density over the basin volume is plainly computed by T A B L E 3 Comparative selected ELF populations (electrons) obtained with several algorithms and programs for benzene and the guanine (G)cytosine (C) complex optimized at the B3LYP/6-31+G(d,p) level of theory.  summing over the grid points assigned to the volume. Tables 2 and 3 show the results obtained for QTAIM and ELF, respectively. The basin numbering is given with respect to the numbering given in Figure 1.
The QCT analysis of C 6 H 6 is expected to reflect the high symmetry of the molecule (D 6h group). For example, it means that the basin populations of all carbon atoms or of all ELF V(C, C) bonding basins should be the same. In contrast, the CG complex does not show any symmetry and seems to be a reasonable choice as test of a rather large system.

| Quantum theory of atoms-in-molecules
The atomic populations, the mean of populations together with the standard deviation (SD) and the root-mean-square deviation (RMSD) obtained with respect to AIMALL results have been reported in the SD is hardly unchanged and the RMSD are smaller than 0.01e for all algorithms when the fine grid spacing is used.

| Electron localization function
The core and valence populations were also computed in the benzene molecule and the CG complex. The mean of selected basin populations together with the SD and the RMSD obtained with respect to off-grid results taken as reference have been reported in Table 3. The near-grid and mixed-near-grid algorithms have been compared satisfactorily to the related results provided by the off-grid populations.
For example, all the tested algorithms depict a mean of populations almost unchanged. The mean of CG populations is 2.392e for the offgrid implementation whereas the mean is 2.387e for the near-grid implementation. These findings remain unchanged whatever the value of the used grid step. However, the smallest SD of V(C, C) populations Note: Basins numbering with respect to the numbering given in Figure 1. A coarse grid step of 0.175 bohr was used. Abbreviation: ELF, electron localization function. a Off-grid algorithm. b Near-grid algorithm.
c Mixed-near-grid. d Off-grid algorithm using a fine grid spacing of 0.075 bohr is taken as a reference.
of C 6 H 6 , around 0.002e, was found for the off-grid implementation. This is $10 times smaller than the deviations obtained for both neargrid and mixed-near-grid algorithms. It is clear that the off-grid algorithm nicely reproduces the high symmetry of benzene. Table 4 shows the ELF populations obtained with a rather large grid spacing of 0.175 bohr. For this latter case, the off-grid valence populations remain rather unchanged in comparison to the neargrid and mixed-near-grid algorithms where some large variations can be observed. This finding is highlighted for the core basins where gradients are sharply modified around the nuclear regions.
For example, a population of 2.03e was found for C(O 8 ) basin using the off-grid implementation while a untenable value of 2.27e is found using the near-grid algorithm, the reference value being of 2.12e obtained from the off-grid implementation using a fine grid step of 0.075 bohr. Once again, this is expected because the real space integrations using uniform grids are less dense near atomic nuclei and the on-grid algorithms are less accurate. Overall, although the ELF topology remains stable enough even for a rather large grid step, this confirms that an off-grid implementation remains more robust when the accuracy of the values of descriptors is required.

| CONCLUSION
In this paper, we have presented a robust strategy designed to make both the calculation of the scalar function over a three-dimensional discrete grid and its assignment to basins more efficient and faster. A comparative study of well-known grid-based algorithms designed to partition the 3d-grid into QCT basin volumes was also presented. The capabilities of grid-based numerical analysis to handle different elaborated one-electron density functions such the ELF topological analysis have also been illustrated. In light of these investigations, the efficiency and the accuracy of the off-grid implementation and its accuracy were demonstrated. Taken into account the speeding-up of the parallelized process used to generate grids, the grid-based QCT is expected to be well suited to molecular or solid systems where large grids need to be produced whatever the used one-density function, that is, for the QTAIM analysis and beyond. Future works will focus on the development of new abilities of grid-based algorithms to allow the handle of typical adaptive grids where gradients of the scalar function are sharply modified around the nuclear regions and a fine grid spacing is required. Note that the QCT analysis on such non-uniform grids remains scarce. 11,28 A new MPI parallel implementation of the off-grid based algorithm is also currently under developments in our group.

ACKNOWLEDGMENTS
We thank Pr Paul Fleurat-Lessard for his constant support, stimulating discussions and for his critical reading of the manuscript.