Energy‐based automatic determination of buffer region in the divide‐and‐conquer second‐order Møller–Plesset perturbation theory

Abstract In the linear‐scaling divide‐and‐conquer (DC) electronic structure method, each subsystem is calculated together with the neighboring buffer region, the size of which affects the energy error introduced by the fragmentation in the DC method. The DC self‐consistent field calculation utilizes a scheme to automatically determine the appropriate buffer region that is as compact as possible for reducing the computational time while maintaining acceptable accuracy (J. Comput. Chem. 2018, 39, 909). To extend the automatic determination scheme of the buffer region to the DC second‐order Møller–Plesset perturbation (MP2) calculation, a scheme for estimating the subsystem MP2 correlation energy contribution from each atom in the buffer region is proposed. The estimation is based on the atomic orbital Laplace MP2 formalism. Based on this, an automatic buffer determination scheme for the DC‐MP2 calculation is constructed and its performance for several types of systems is assessed.

of many-body expansion from the original two-body to three-body 19,20 and four-body 21 expansions. The pair natural orbital (PNO) electron correlation approach 22,23 adopts several truncation schemes for construction of correlated virtual orbitals (i.e., PNOs) for each occupied local molecular orbital (MO) pair, where the bond-based (so-called IEXT) or distance-based (so-called REXT) truncation is used to determine the local virtual orbital region to construct PNOs. Since molecular energy is the most important property in quantum chemical calculations, an energybased parameter is more desirable than a distance-based one. For example, the divide-expand-consolidate method utilizes the energy-based fragment optimization threshold to determine the atomic occupied and virtual orbital spaces in each fragment. 24,25 Yang and coworkers introduced a linear-scaling approach called the divide-and-conquer (DC) method. 26,27 The DC method has been applied to the HF or DFT self-consistent field (SCF), 26,28 density-functional tightbinding, [29][30][31][32] and post-HF (MP2 [33][34][35][36] or CC [37][38][39] ) energy calculations as well as the SCF 40 and MP2 41 energy gradient calculations. For treating static electron correlation in large-scale systems, the DC method has also been combined with the Hartree-Fock-Bogoliubov method 42 and the thermally-assisted occupation (finite temperature) scheme. 43 In the DC method, the size of the buffer region plays the role of the distance parameter to adjust the approximation error; a larger buffer size leads to a smaller approximation error. However, it is still difficult to estimate the error in energy based on the distance-based adjustment parameter.
Recently, we 44 proposed a scheme to estimate the energy error introduced in the DC-HF and DC-DFT calculations using a two-layer buffer region scheme introduced by Dixon and Merz. 45 This estimation scheme can successfully be applied to automatically determine the appropriate buffer region based on the estimated energy error. 44 This study attempts to export the idea of the previous automated DC-HF scheme to the DC-MP2 calculation. Kobayashi et al. 36 reported that the buffer region used for the MP2 correlation calculation can be contracted from that for the HF one to achieve the same energy accuracy as the DC-HF calculation because of the short-range nature of the MP2 dynamical electron correlation. We first develop a method to estimate the subsystem MP2 correlation energy contribution from each atom in the buffer region. Here, the idea of the atomic orbital (AO) Laplace MP2 method [46][47][48][49][50] is used as well as the Schwarz inequality. Based on this estimated energy contribution, we established an algorithm to automatically determine the appropriate buffer region in the DC-MP2 calculation.
This paper consists of four sections. Section 2 gives a brief summary of the linear-scaling DC electron correlation method with a fixed buffer region as well as the present procedure to estimate the energy contribution from each buffer atom and the automated DC-MP2 algorithm. Numerical assessments are described in Section 3. Finally, we provide concluding remarks in Section 4.

| The DC-MP2 electron correlation calculation
We first outline the DC-MP2 electron correlation calculation scheme.
The DC-MP2 method is applicable only with atom-centered basis functions. Each basis function, ϕ μ (r), called an AO, is denoted by a Greek letter index, μ, ν, …. In the DC method, the entire system is divided into several subsystems, each of which consists of the central and buffer regions. Each central region is mutually exclusive with the other central regions. The sets of AOs belonging to the central and buffer regions of subsystem α are referred to as S(α) and B(α), respectively.
In the DC-MP2 method, the MOs in the subsystem α, where F α [D SCF ] is the subsystem Fock matrix constructed with the density matrix D SCF , and S α is the subsystem overlap matrix with the element S α μν = ϕ μ jϕ ν À Á for μ, ν L(α). Note that the subsystem Roothaan equation ( 2 ) has to be solved not self-consistently but just once using predetermined D SCF . The density matrix, D SCF , can be constructed from the standard or approximate HF calculation, such as the DC-HF one. If D SCF is obtained from the DC-HF calculation, it is constructed with the local density matrices, {D α }, and the partition matrices, {p α }, as the following: Because the buffer region in each localization region overlaps with the other localization regions, ΔE α 2 ð Þ corr is obtained as the MP2 correlation energy corresponding to the central region of the localization region α by means of energy density analysis (EDA). 51 The subsystem correlation energy is then evaluated by with the two-electron integral notation

| Estimation of the DC-MP2 energy
Based on EDA, the MP2 correlation energy for subsystem α, ΔE α 2 ð Þ corr , can be further divided into contributions from the atoms in the locali- According to the local correlation philosophy for dynamical electron correlation, [52][53][54]  can be expressed as.
where X α (τ) and Y α (τ) are the energy-weighted density matrices expressed as Here, the Fermi level, ε F , may be already determined in the prior DC-HF calculation, or may be the midpoint energy between HOMO and LUMO in the prior HF calculation. For estimation purpose, we drastically approximate the integral in Equation ( 10 ) by the one-point Gauss-Laguerre quadrature, namely, Assuming that the rhs of Equation ( 13 ) gives the upper limit of ΔE α 2 ð Þ B , its absolute value can be bounded by adopting the Schwarz inequality Here, on the analogy to the scaled oppositespin MP2 method, 55 the term A α κδ A α εγ was omitted owing to its smaller contribution. Because the summation in parentheses in Equation ( 17 ) is constant for subsystem α, the following index can be considered as the magnitude of the contribution from atom B: Using the above e α B index, we propose the following automatic determination scheme for the buffer region in the DC-MP2 method: 1. Assignment of the initial DC-MP2 buffer region for each subsystem. This may be determined by prior DC-HF calculation.

| Computational details
We implemented the above-mentioned automatically controlled DC-MP2 method to the GAMESS package 56,57 and evaluated its accuracy and efficiency for the different types of systems. In the DC-HF calculations, the inverse temperature parameter, β, was set to 125 a.u. and the Fermi function cutoff factor (the FTOL option of $DANDC input group in GAMESS program) was set to 20. In addition, the parameters in the automated DC-HF method were set to e SCF thresh = 0.1 μE h and r ext = 3.0 Å, the definitions of which are given in our previous paper. 44 The 6-31G(d) basis set 58 was adopted throughout this paper. We introduced the major axis radii of the HF and MP2 localization regions for subsystem α, l SCF,α local and l corr,α local , respectively, to discuss the size of the localization regions determined by the automated DC method. l SCF,α local (or l corr,α local ) was defined as half of the maximum atom-pair distance in the HF (or MP2) localization region for subsystem α. The two-electron AO integrals, (μνjλσ), were treated in so-called "direct algorithm" manner, that is, the same integrals were calculated repeatedly for every subsystem.

| Estimation of the atomic MP2 energy contributions
We first applied the present automated DC-MP2 method to a cubic system containing 100 water molecules with weight density of 1.0 g cm −3 . Each water molecule was adopted as a central region in the DC calculation. To assess the performance of the automated DC-MP2 calculation, the entire system was selected as the initial localization region for every subsystem in the DC-MP2 calculation. Figure 1 shows the estimated MP2 energy contributions from buffer atom B (e α B ) with respect to its distance from the O atom in the central region. The blue plot represents the value for B being an H atom, and the red plot that for B being an O atom. The estimated energy contribution decays exponentially as the distance from the central region increases. The slight difference in the slope for H and O atoms in Figure 1 is probably due to the fact that the summation over AOs at the buffer atom in Equation ( 9 ) runs for the virtual orbital, that is, the charge-transfer excited configurations from O atoms in donor water to H atoms in acceptor water are more significant than those from acceptor to donor. This behavior was also confirmed for the water dimer system using the intermolecular interaction energy decomposition with the local PNO method. 59 Note that the estimated energies in Figure 1 for the interatomic distance of 2-3 Å are up to several hundred E h , which are significantly larger than the total MP2 energy of $19 E h . This is because that the estimated energy (e α B ) is derived as the upper limit of the atomic MP2 energy contribution.
From the following section, the energy threshold in the automated DC-MP2 method, e corr thresh , was set to 0.1 μE h unless otherwise noted. Next, the dependence of the computational time of e α B on the system size was examined, as shown in Figure 2 ), which maintains an almost linear-scaling behavior with respect to the entire system size, as was discussed in Section 2.2.

| Accuracy and computational time of the present method
The accuracy and the computational time of the automated DC-MP2 method were investigated for the cubic water system in Section 3.2. Table 1 shows the energy-threshold (e corr thresh ) dependence of the DC-MP2 correlation energy. Following Section 3.2, each water molecule F I G U R E 1 Estimated atomic MP2 energy contributions with respect to the interatomic distance. The blue plots represent the estimated MP2 energy of H atom and the red plots represent of O atom in the buffer region was adopted as a central region and the entire system was selected as the initial localization region. The average and standard deviation (SD) of major axis radii ( l corr local and σ l corr local Â Ã , respectively) are also given in Table 1. For e corr thresh = 100 μE h , the actual correlation energy error per atom is 18.37 μE h , which is sufficiently smaller than e corr thresh . It should be noted that the MP2 energy error decreases systematically as e corr thresh decreases, while the dependence is not proportional but rather logarithmic to e corr thresh . As with the e thresh parameter in automated DC-SCF method, 44 the smaller e corr thresh parameter leads to a larger localization region, which can be confirmed from the average of the major axis radii of all localization regions, l corr local . Interestingly, the SD of the major axis radii, σ l corr local Â Ã , also tends to increase systematically as e corr thresh decreases, except for e corr thresh = 0:1μE h . This fact suggests that the present scheme can effectively aid the selection of the appropriate buffer region for each subsystem in the DC-MP2 calculation.
Next, we examined the combination of the present automated DC-MP2 method with the automated DC-HF calculation. Table 2 shows the dependence of the automated DC-MP2 energy on the initial DC-HF inner and outer buffer sizes, r in b and r out b , the definitions of which are given in our previous paper. 44 The averages ( l HF  Table 1. A smaller initial DC-HF buffer size leads to a larger l HF local D E , as was also confirmed in the previous study. 44 When combined with the automated DC-HF method, l corr local becomes smaller than its value when the initial localization region is set to be the entire system. Similarly, σ l corr local Â Ã is $0.14 Å smaller than σ l HF local h i . Next, we applied the proposed method to a covalently bound system, namely, the chignolin protein with 10 amino F I G U R E 2 System-size dependence of the CPU time of the evaluation of e α B for the model system containing N water water molecules. The initial buffer size for the DC Hartree-Fock (DC-HF) calculation was fixed to r in b = 4:5Å and r out b = 5:5Å T A B L E 1 e corr thresh dependences of the DC-MP2 correlation energy and the major axis radius for 100 water cluster system T A B L E 2 Initial DC-HF buffer-size dependence of the automated DC-MP2 correlation energy and the major axis radius for 100 water cluster system acids. The geometry of chignolin was obtained from the protein data bank (PDBID: 1UAO). Hydrogen atoms were added using the Discovery Studio 2017 R2 software. 60 In the DC calculation, the entire system was divided between the carbonyl C and α-C atoms, and each of the divided systems was treated as a central region. Table 3 shows the e corr thresh dependence of the DC-MP2 energy for chignolin. The entire system was selected as the initial localization region for every subsystem in the DC-MP2 calculation.
For e corr thresh = 100 μE h , the actual correlation energy error per atom is 2.82 μE h , which is sufficiently smaller than e corr thresh . As was also confirmed in the case of the water system, the MP2 energy error decreases systematically as e corr thresh decreases. Again, the dependence of the error on e corr thresh is rather logarithmic. The smaller e corr thresh leads to the larger l corr local , while it leads to the smaller σ l corr local Â Ã , contrary to the case of water system. Comparing Table 3 with Table 1, l corr local of chignolin is about 1.0 Å larger than that of the water system for the same e corr thresh parameter, reflecting the delocalized electronic nature in the covalently bound system. T A B L E 4 Initial DC-HF buffer-size dependence of the automated DC-MP2 correlation energy and the major axis radius for chignolin T A B L E 5 Initial DC-HF buffer-size dependence of the automated DC-MP2 energy and the major axis radius for the β-strand glycine oligomer (GLY) 20 Next, we combined this with the automated DC-HF calculation. Table 4 shows the dependence of the DC-MP2 energy on the initial DC-HF buffer size. The automated DC-HF energy error for chignolin is smaller than that for the water system and almost independent of the initial DC-HF buffer region, while the radius of the DC-HF localization region ($7.5 Å) is about 1 Å greater than for the water system ($6.5 Å). Subsequently, the DC-MP2 energy error is also almost constant ($0.7 μE h atom −1 ). For this small protein system, in contrast to the result in Table 2 for the water system, the SD of the sizes of the localization regions for the MP2 calculation is larger than that for the HF calculation. This is because the entire size of the chignolin system is so small that the localization region for every subsystem is close to the entire system. The present method was also tested on the β-strand glycine oligomer (GLY) 20 , and the result of the calculation are given in Table 5. In Table 5, the DC-MP2 calculations with different e corr thresh were performed to confirm that the present automated DC-MP2 energy error depends primarily on e corr thresh and hardly on the initial buffer radii. For this stretched system, the SD of the localization region sizes for the MP2 calculation is smaller than that for the HF calculation, while the energy error is similar to the result in Table 4. As well as the case of water system, the smaller e corr thresh leads to the larger l corr local and σ l corr local Â Ã .
Finally, the present method was applied to the conjugated polyacetylene chain C 2n H 2n + 2 , shown in Figure 3. All atoms were placed in a plane and the C C, C C, and C H bond lengths were fixed at 1.462, 1.357, and 1.096 Å, respectively. Each C 2 H 2 (or C 2 H 3 for edges) unit divided at the C C single bond was treated as a central region. Table 6 shows the system-size dependence of the standard and DC-MP2 energies. For the automated DC calculations, the initial sizes of the inner and outer buffer regions in the automated DC-HF calculation were set to r in b = 5:0Å and r out b = 6:5Å, respectively. To avoid division of the localization region at C C double bond, each C 2 H 2 (or C 2 H 3 ) unit was treated as one piece, that is, a unit was extracted from the DC-MP2 localization region only when all the estimated MP2 correlation energies, e α B È É , for the atoms in the unit were smaller than the threshold, e corr thresh (analogous to the BUFTYP = RADSUB option of $DANDC input group in GAMESS program). The DC-MP2 energy error per atom is almost constant for n ≥ 30. It was demonstrated that the correlation energy error can be controlled with the present method, even for conjugated systems. For this conjugated system, the dependence of the computational time on the system size was also examined, as shown in The scaling analysis was also conducted for three-dimensional water cluster systems. Figure 5 shows the dependence of the wall-clock computational time for the DC-MP2 calculation on the number of water molecules, N water . The times were measured F I G U R E 3 Structure of polyacetylene chain system, C 2n H 2n+2 T A B L E 6 The system-size dependence of the MP2 electron correlation energy in the standard MP2 and automated DC-MP2 calculations for polyacetylene chain system, C 2n H 2n+2 ), which indicates that the present method also achieves near-linear scaling computational time even for three-dimensional systems.

| CONCLUDING REMARKS
In this study, we have proposed an automatic determination scheme for the buffer region in the DC-MP2 calculation. This method is based on a subsystem MP2 correlation energy contribution from each atom in the buffer region, which is estimated with the help of the AO-Laplace MP2 method and the Schwarz inequality. Because the appropriate size of the buffer region in the DC-MP2 calculation can be smaller than that in the DC-HF calculation, as suggested in a previous paper, 36 the present scheme reduces the buffer region from the prior DC-HF calculation. We applied the present method to a 100 water cluster system and the chignolin system, and confirmed that the estimated DC-MP2 energy error can be systematically reduced as the energy threshold, e corr thresh , decreases. We also confirmed that the linearscaling behavior of the DC-MP2 method is preserved even with the present automation scheme.
Since the MP2 amplitude is known to provide a good guess for the CC method in many cases, the proposed automation scheme is straightforwardly applicable to the DC-CC method. [37][38][39] Improvements in the accuracy of the correlation energy contributions from buffer atoms are also desirable, especially for delocalized systems.
The use of the inequality test proposed by Thompson et al. 62 instead of the Schwarz inequality would be one way to provide this improvement.  F I G U R E 5 System-size dependence of the Wall-clock time of the automated DC-MP2 calculations for the model system containing N water water molecules. The initial sizes of the inner and outer buffer regions in the automated DC Hartree-Fock (DC-HF) calculation were set to r in b = 4:5Å and r out b = 5:5Å, respectively. The energy threshold in the automated DC-MP2 method, e corr thresh , was set to 10 μE h