The effect of midbond functions on interaction energies computed using MP2 and CCSD(T)

In this article we use MP2 and CCSD(T) calculations for the A24 and S66 data sets to explore how midbond functions can be used to generate cost effective counterpoise corrected supramolecular interaction energies of noncovalent complexes. We use the A24 data set to show that the primary role of midbond functions is not to approach the complete basis set limit, but rather to ensure a balanced description of the molecules and the interaction region (unrelated to the basis set superposition error). The need for balance is a consequence of using atom centered basis sets. In the complete basis set limit, the error will disappear, but reaching the complete basis set limit is not feasible beyond small systems. For S66 we investigate the need for increasing the number of midbond centers. Results show that adding a second midbond center increases the accuracy, but the effect is secondary to changing the atom centered basis set. Further, by comparing calculations using the 3s3p2d1f1g midbond set with using aug‐cc‐pVDZ and aug‐cc‐pVTZ as midbond sets, we see that the requirements for the midbond set to be effective, is not just that it contains diffuse functions, but also that high angular momentum functions are included. By comparing two approaches for placing midbond centers we show that results are not particularly sensitive to placement as long as the placement is reasonable.

(also known as "gold standard" of electronic structure theory) 17 provides quantitative accuracy. 15,[18][19][20] This method has two apparent challenges, namely the N 7 scaling of the CCSD(T) model (with N being a number of correlated electrons/measure of molecular size) and a very slow convergence of the electron correlation energy to the CBS limit. The scaling issue can be avoided by local approximations, [21][22][23] while with respect to the latter point, extrapolation techniques [24][25][26][27][28][29] towards the CBS limit have proved effective and are therefore widely used. [30][31][32] Alternatively, the convergence towards the CBS limit can be improved by including terms into the wave function, which explicitly depend on the intermolecular distance. 33,34 These so-called explicitly correlated methods are characterized by a much faster convergence to the CBS limit. 35,36 A common approach for computing interaction energies is the supramolecular approach, in which the interaction energy of two weakly bound systems is defined as the difference of the energies of the dimer and then energies of the monomers. In addition to scaling and incomplete basis set challenges, the supramolecular approach (in combination with a finite basis set) introduces errors such as the basis set superposition error (BSSE). The BSSE is an artifact of the supramolecular approach, and it appears since the monomers of the complex calculation will have an improved description over the separate monomers. 37 The BSSE results in an artificially strong computed interaction energy of a complex, 38 and for noncovalent interactions the BSSE may be on the same order of magnitude as the interaction itself, thus a correction of the error is necessary. The counterpoise (CP) correction method of Boys and Bernardi 39 has been used in numerous applications and has proven very efficient. At the same time, the CP approach tends sometimes to overcorrect for BSSE. [40][41][42][43] Therefore, Sherill and co-workers 44 recommended to apply the so-called "half-counterpoise" corrections (average of raw and CP-corrected quantities) for ab initio calculations on noncovalent complexes using the basis set of aug-cc-pVQZ and below. This is also valid at explicitly correlated levels of theory, MP2-F12 and CCSD(T * )-F12, in combination with intermediate basis sets (e.g., cc-pVTZ-F12), while with small basis like cc-pVDZ-F12 uncorrected results are closer to CBS limit due to beneficial error compensation between BSSE (overbinding) and intrinsic basis set incompleteness (underbinding). 45 Other alternatives to the classical CP correction method are known-the socalled virtual CP approach, 37,46 atom by atom scheme (CP aa ) 47 and Same Number Of Optimized Parameters scheme 48,49 to name few.
There is a second effect on the quality of the computed interaction energies which is much less discussed than the BSSE. In electronic structure theory, we commonly choose atom centered basis sets. Since intermolecular distances are large for noncovalent interactions (relative to distances in covalent bonds) the atom centered basis functions have limited flexibility in the interaction energy. And hence, the nucleus centered basis sets favor the description of the molecule region over the interaction region, thus resulting in an imbalanced description of the noncovalent system. This effect will thus also be present in BSSE corrected calculations since the monomer and dimer energies are still just computed using the atom centered basis set.
This effect is seldom discussed in its own right, and the associated error is attributed to the fact that CBS is not reached. However, the CBS errors will be much larger for the interaction region than for the molecules due to basis sets being centered on atoms. The errors associated with not reaching CBS are commonly addressed by using large atom centered basis sets with diffuse functions, for example, aug-cc-pVXZ basis sets with cardinal number X = 5 or higher. 19 However, this rapidly ends in unfeasible calculations even for small molecular systems and moreover, this is an inefficient way to progress since the approach requires one to go close to CBS limit both in molecule regions and the interaction region, using atom centered basis functions. It is also worth noting that explicitly correlated models alone cannot remove the error introduced by the imbalanced description of molecules and interaction region.
As an alternative, Tao and Pan 50 introduced the use of midbond functions, a set of functions centered in the interaction region to supplement atom centered basis functions. The 3s3p2d midbond functions set with exponents 0.9, 0.3, 0.1 for s-and p-functions and exponents 0.6 and 0.2 for d-functions was one of the first used midbond sets. This set was subsequently implemented by several research groups in the 1990's for calculations of interaction energies and potential energy surfaces of rare gas dimers and rare gasmolecule complexes. [51][52][53][54][55][56][57][58][59] Extension of this midbond set by one f-type function and further by one g-type function resulted in 3s3p2d1f and 3s3p2d1f1g sets (the added f-and g-functions have both the exponent 0.3), respectively. These sets have been applied in calculations of potential energy surfaces [60][61][62][63][64][65][66] and also dissociation energies. 67 Some examples of large sets of midbond functions, for example, 6s6p6d3f3g3h are also known. 68 A prominent example of the computational effectiveness of midbond functions for computing noncovalent interaction energies is the recent extensive studies by the Patkowski group, 69 demonstrating that at CCSD(T) and CCSD(T)-F12 levels of theory aug-cc-pVDZ basis set supplemented with midbond functions can provide results of high accuracy. 17,69 Midbond functions are generally used on ghost atoms located in the intermolecular region, and displacement along the van der Waals bond has been shown to have only a negligible impact on the accuracy of interaction energies of rare gas complexes. 51,52 The optimal position of midbond functions in small molecular dimers was also investigated in a more recent study by Shaw and Hill,18 where they optimized the position of the midbond functions. However, since moving the midbond center closer to the larger/heaviest monomer is energetically favorable, such an optimization procedure only maps the effect of placement along the bond. The study by Shaw and Hill 18 has also challenged the prevalent assumption that the interaction energy is insensitive to the exponents of the midbond functions. 70 It was demonstrated that significant improvement in both canonical and explicitly correlated calculations of interaction energies can be achieved by optimization of midbond exponents. 18 To this date, systematic studies of effects of midbond functions in calculations of interaction energies for medium sized molecular systems (30 atoms) using midbond functions have been performed on the S22 data set. 11,69 Despite the usefulness and popularity of the S22 data set, [71][72][73][74] there are some issues associated with it. For example, the S22 data set mainly targets interactions of nucleic acid bases while other interaction categories are under-represented (e.g., single hydrogen bonds) or even missing (aliphatic-aliphatic dispersion interactions). 75,76 The S66 data set was designed to cover a wider range of interaction motifs. 75 For the S66 data set, Ma and Werner have included midbond functions in local correlation calculations of interaction energies. 77 However, they present no systematic studies on how midbond centers specifically affect the results for the S66 data set.
In this paper we use the A24 and S66 data sets to illustrate how midbond functions affect the computed interaction energies at MP2 and CCSD(T) levels of theory. We show that using midbond functions of the type 3s3p2d1f1g alleviates the requirements on the chosen nucleus centered basis set, and that including a midbond center is highly efficient for small molecular systems, as displayed using the A24 set of molecules. Further, the A24 data set is used to illustrate that midbond functions provide a balanced description of molecules and interaction regions. For the S66 data set we explore the inclusion of two, rather than just one, midbond centers and we show that the accuracy increases upon adding the second midbond center, which, however, is associated with significantly increased computational costs. We also compare computations of interaction energies using the 3s3p2d1f1g midbond set with the use of correlation consistent aug-cc-pVDZ and aug-cc-pVTZ as midbond sets and we see that the midbond set needs to contain functions of high angular momentum (at least f-functions) in order to be effective. Further, by examining two approaches for placing midbond centers we show that results are not particularly sensitive to exact placement as long as it is reasonable.
The paper is organized as follows. Section 2 contains an exposition of the methodology. In Section 3 we use the A24 data set to illustrate the primary objective for using midbond functions. In Section 4 we present results for the S66 data set. In Section 5 we provide a summary and concluding remarks.

| Data sets
For the presented study we use the A24 78 and S66 75 data sets.
Whereas the A24 data set consists of small molecular systems, the S66 data set contains larger and more biologically relevant complexes.
Both data sets include noncovalent interactions of various kinds, covering hydrogen bonds (referred to as hydrogen group), dispersion (dispersion group) and a mixture of interaction types (others group). The complex geometries are used without further optimization and they can be found in the BEGDB online database. 79

| Interaction energy
All interaction energies are calculated using the supramolecular approach, in which the interaction energy E int of two weakly bound atoms or molecules is defined as the difference of the energies of the dimer AB, E(AB), and the energies of the monomers A, E(A), and B, E(B).
CP correction 39 is used to correct for the BSSE, that is, the monomer energies are calculated in the basis of the dimer. The interaction energy is therefore given by, The superscript "AB" on monomers indicates that the basis for the dimer is used in the monomer calculations. All results presented in this paper are computed using the CP correction, and for simplicity, the "CP" superscript is made implicit.

| Midbond sets
In this work, we will investigate the performance of the midbond set  The standard procedure of adding midbond functions is to place them on the midpoint between the centers of mass of the interacting monomers. 50,67,68 Patkowski and co-workers stress that this method often leads to a midpoint being located closer to one of monomers 69 and thus favoring its description. In order to avoid this, they refer to an algorithm 80 of locating a midbond center, where its location is a r À6 weighted average of intermolecular atom-atom midpoints, where the summation runs over all atoms a in subsystem A and all atoms b in subsystem B. 69 Except where stated explicitly otherwise, we will use Equation (2) to place one midbond center (referred to as systematic). In Section 4.3 we explore the effect of placement of midbond functions, and we include results for where simple chemical intuition is used to place midbond centers (referred to as intuitive, see Figure S1). A detailed description of this approach can be found in Section S3.5 in the Supporting Information. In sections where we use two midbond centers for the S66 data set, we only use the 53 out of 66 complexes which have wide enough interaction regions. The two centers are placed manually. For system names of the 53 complexes with two midbond centers see Table S25. Geometries of the complexes (including midbond centers) of the A24 and S66 data sets are available at https://doi.org/10.18710/2FWECY.

| Statistical measures
The errors of the interaction energies are calculated in absolute terms and in relative (%) terms, where E ref int n ð Þ is the reference value for the interaction energy of complex n, and E calc int n ð Þ is the computed value of the interaction energy for complex n. Since the magnitude of interaction energies varies significantly with interaction type, using errors in relative terms (%) makes it easier to compare the quality of results for different interaction types.
We will present errors in terms of the mean absolute error, Δ abs , and the relative mean absolute error, Δ rel abs , where N is the total number of noncovalent complexes.

| Computational details
The interaction energies for the A24 (see Section 3) and S66 (see Section 4) data sets are computed at MP2 and CCSD(T) levels of theory using correlation consistent basis sets (aug)-cc-pVXZ 81 with X = D, T, Q for A24 and X = D, T for S66, within the frozen-core approximation and employing CP correction for BSSE. All calculations presented in this work are carried out using the LSDALTON program. 82 The Hartree-Fock convergence threshold was tightened from default to 10 À7 and the integral threshold was tightened from default 10 À8 to 10 À9 for all calculations. The CCSD(T) reference interaction energies for the A24 data set were obtained from aug-cc-pVQZ(+33211) calculations (see Table S1), where a 3s3p2d1f1g midbond center was placed according to Equation (2). For CCSD(T) calculations in the S66 set, we used the revised benchmark CCSD(T)/CBS interaction energies presented in the publication of Hobza et al. 76 as reference values. To obtain MP2 reference values for both data sets we compute interaction energies at MP2/aug-cc-pVQZ level of theory adding one 3s3p2d1f1g midbond center according to Equation (2) (see Tables S1 and S16).

| THE OBJECTIVE OF MIDBOND FUNCTIONS
In this section we use the A24 set to illustrate why the inclusion of a midbond center significantly improves the interaction energy when using small basis sets, and which is unrelated to reaching the CBS limit. We do so by examining computed monomer and dimer energy contributions compared to reference monomer and dimer energies (all within the CP correction). We present CCSD(T) results for the A24 data set using the basis set cc-pVXZ, X = D, T, Q with and without the midbond sets 3s3p2d1f1g (denoted +33211) and carbon augcc-pVXZ, X = D, T, Q (denoted +aXZ). In Table 1 we present the energy differences for monomers (M A and M B ) and dimers (D) for each basis set combination (for monomer and dimer energies see Tables S13-S15) with respect to reference values computed using aug-cc-pVQZ(+33211). Δ abs and Δ rel abs for the interaction energies are presented in Table 2 (interaction energies itself can be found in Tables S7-S9). Note that errors in energy contributions in Table 1 are given in mE h , while errors in interaction energies in Table 2 are given in kcal/mol.
We first consider the energy differences for monomers and dimers separately, presented in Table 1. We see that for cc-pVXZ in combination without and with both types of midbond sets (3s3p2d1f1g and aug-cc-pVXZ), Δ abs in monomer and dimer energies are, as expected, decreasing as the basis set is increased from cc-pVDZ through cc-pVTZ to cc-pVQZ. Further, we note that the errors in M A , M B , and D are similar for results with and without midbond functions but with somewhat larger differences for cc-pVDZ results. However, the errors with and without midbond functions are still on the same order of magnitude even for cc-pVDZ results. In contrast, if we look at Δ rel abs and Δ abs results in Table 2, we see a significant difference between errors for the interaction energies computed with and without midbond sets. For example, for the cc-pVDZ calculation, Δ rel abs for cc-pVDZ(+33211), cc-pVDZ(+aDZ), and cc-pVDZ are 11.8%, 23.9% and 68.7%. Increasing the basis set to cc-pVTZ reduces Δ rel abs to 6.2%, 5.9%, and 31.5% for cc-pVTZ(+33211), cc-pVTZ(+aTZ), and cc-pVTZ, respectively. Hence, we see that although the errors for M A , M B , and D energies for the cc-pVTZ without a midbond set are much smaller than the errors for cc-pVDZ with midbond sets, the error in the interaction energy itself for the cc-pVTZ calculations without a midbond set (Δ rel abs = 31.5%) is larger than the interaction energy error, for example, for cc-pVDZ(+33211) (Δ rel abs = 11.8%). Thus, we see that more accurate results for M A , M B , and D do not necessarily correlate with more accurate interaction energies. The primary role of a midbond set is therefore not to provide more accurate monomer and dimer energies by more rapidly reaching CBS limit, but rather to ensure that a balanced description of monomers and dimer is obtained. That is, a midbond set corrects imbalances in description of molecules and the interaction region occurring due to the use of atom

| RESULTS FOR S66 DATA SET
In this section we present results for the S66 data set. In Section 4.1 we discuss the effect on interaction energies using one and two 3s3p2d1f1g midbond centers. In Section 4.2 we compare the results of using two 3s3p2d1f1g centers with using two carbon-type aug-cc-pVDZ midbond centers. We further comment on relative timings for various combinations of basis sets and midbond functions. In Sec-

| One versus two 3s3p2d1f1g midbond centers
The S66 data set contains complexes with wide interaction regions, and it therefore is reasonable to investigate whether an increased number of midbond centers will result in improved interaction energies. Further, we investigate whether the increased number of midbond centers can compensate for the deficiencies of using a small atom centered basis set (e.g., cc-pVDZ), as was the case for the A24 data set (see Section 3).
Note that two midbond centers are only used for 53 out of 66 complexes, * where the interaction region is reasonably large. Accordingly, we compare results obtained using two midbond centers (see Tables S17 and S21) with results obtained using one midbond center (see Tables S19 and S23) only for the 53 complexes. Results for Δ rel abs and Δ abs are shown in Table 3 and visualized in Figure 1.  Table 3).
Hence, the errors reflect the basis set error of the method, rather than a combined method and basis set error. Smaller MP2 than CCSD(T) errors therefore do not indicate that MP2 interaction energies are better than corresponding CCSD(T) results.
We first discuss the cc-pVDZ results. The cc-pVDZ(+33211) and cc-pVDZ(+2*33211) results for MP2 show that errors are reduced when including two midbond centers rather than one. The effect is particularly pronounced for dispersion group, where Δ rel abs is reduced from 23.8% to 14.3% upon increasing from one to two midbond centers. The same is seen for CCSD(T) where the Δ rel abs is reduced from 29.1% to 16.3% for dispersion group, but to a less extent for the other two groups. However, the errors for MP2 and CCSD(T) results using cc-pVDZ(+33211) and cc-pVDZ(+2*33211) are quite large. Errors for small complexes when using cc-pVDZ in combination with midbond functions are significantly smaller, because the midbond sets are able to both give an improved description of the molecules and the interaction region. For larger complexes, the midbond functions are not able to compensate for the poor description provided by the cc-pVDZ basis for the molecules.
T A B L E 1 Mean absolute errors (Δ abs , mE h ) in monomers (M A and M B ) and dimer (D) energies of the A24 set of complexes computed using CCSD(T) and basis set combinations cc-pVXZ (+33211), cc-pVXZ(+aXZ), and cc-pVXZ for X = D, T, Q Note: Errors are relative to CCSD(T)/aug-cc-pVQZ(+33211) reference values.
T A B L E 2 Relative mean absolute errors (Δ rel abs , in %) and mean absolute errors (Δ abs , in kcal/mol) for interaction energies of the A24 set of complexes calculated at CCSD(T) level of theory using cc-pVXZ (+33211), cc-pVXZ(+aXZ), and cc-pVXZ for X = D, T, and Q The quality of results obtained when using one midbond center improves upon increasing the atom centered basis set from cc-pVDZ to cc-pVTZ. The most substantial improvement is observed for the dispersion group. That is, for MP2 Δ rel abs is reduced from 23.8% (cc-pVDZ(+33211)) to 8.6% (cc-pVTZ(+33211)), for CCSD(T) Δ rel abs is reduced from 29.1% to 12.7%. The inclusion of the second midbond center into the interaction region leads to a further improvement of the results, for example, for MP2 Δ rel abs is reduced from 8.6% (cc-pVTZ (+33211)) to 5.5% (cc-pVTZ(+2*33211)) for dispersion group. CCSD(T)/cc-pVTZ(+2*33211) results follow the same trend. However, the positive effect of adding the second midbond center is less significant for cc-pVTZ than for cc-pVDZ.
We now discuss results for the aug-cc-pVDZ basis set in combination with one or two midbond centers. For the MP2 method using aug-cc-pVDZ(+33211) we obtain Δ rel abs of 4.3%, 6.3%, and 5.1% for hydrogen, dispersion, and others interaction groups, whereas aug-cc-pVDZ(+2*33211) gives Δ rel abs of 2.5%, 4.0%, and 3.9%, respectively. The CCSD(T) results display similar errors and reductions in errors upon inclusion of two midbond centers. CCSD(T) results using aug-cc-pVDZ(+33211) give Δ rel abs of 6.0%, 6.8%, and 6.5% for hydrogen, dispersion, and others interaction groups, whereas using aug-cc-pVDZ (+2*33211) gives Δ rel abs of 4.1%, 3.9%, and 4.9%, respectively. The dispersion group is more affected than the other two interaction types, but compared to results for cc-pVDZ and cc-pVTZ, the improvement brought by the second midbond center is reduced in case of aug-cc-pVDZ results. The aug-cc-pVDZ basis set treats individual molecules (and interaction region) better due to the presence of diffuse functions in the basis set itself. Hence, aug-cc-pVDZ basis set supplemented with one midbond function is in general a good choice for the description of weak interactions in S66 complexes, especially to obtain cost efficient description of dispersion. The addition of the second midbond center only has a secondary effect on the accuracy.
The results for the S66 data set demonstrate the importance of both sufficiently describing molecule regions as well as the interaction region, and that the requirement on the molecules is greater for the S66 data set than for the A24 data set. It is likely that for small complexes, the T A B L E 3 Relative mean absolute errors (Δ rel abs , in %) and mean absolute errors (Δ abs , in kcal/mol) of interaction energies for 53 S66 complexes (see Section 2.3) calculated at different levels of theory using cc-pVDZ, cc-pVTZ, and aug-cc-pVDZ in combination with one (33211) or two (2*33211) midbond centers F I G U R E 1 Relative mean absolute errors for 53 S66 complexes (see Section 2.3) obtained at MP2 and CCSD(T) levels of theory using (aug)cc-pVXZ(+Y*33211) (X = D, T; Y = 1,2) basis sets F I G U R E 2 Timings for MP2/aug-cc-pVDZ(+2*aDZ), MP2/aug-cc-pVDZ(+2*33211) and MP2/cc-pVTZ(+2*aTZ) measured relative to MP2/ aug-cc-pVDZ(+33211) timings. Only 53 complexes of the S66 data set are considered as specified in Section 2.3. The assignment of the system numbers is according to the original publication of Hobza and coworkers 75 and can be found in Table S25 T A B L E 4 Relative mean absolute errors (Δ rel abs , %) and mean absolute errors (Δ abs , kcal/mol) of interaction energies for 53 complexes of the S66 data set (as specified in Section 2.3) divided into interaction type calculated at MP2 and CCSD(T) levels of theory supplemented with midbond functions 2*3s3p2d1f1g (2*33211) or 2*aug-cc-pVXZ (2*aXZ) F I G U R E 3 Relative mean absolute errors for the S66 data set obtained at MP2 and CCSD(T) levels of theory using cc-pVDZ, cc-pVTZ, and aug-cc-pVDZ basis sets supplemented with two 3s3p2d1f1g (2*33211) or aug-cc-pVXZ (2*aXZ) midbond centers. 2*aXZ means 2*aDZ for cc-pVDZ and aug-cc-pVDZ calculations and 2*aTZ for cc-pVTZ calculations diffuse midbond functions also improve molecule description enough to obtain a balanced result, whereas this is not the case for larger complexes. As expected, only saturation of the interaction region by an increased number of midbond functions is insufficient for achieving accurate interaction energies for larger complexes, if using small atom centered basis sets, for example, cc-pVDZ. Contrary, the combination of an augmented atom-centered basis set and one midbond center is seen to provide sufficient accuracy for describing interaction energies, also for large complexes. The accuracy can be further improved by either adding the second midbond center or increasing the atom-centered basis set (or both). However, this will raise the computational cost.

| A comment on timings
Having examined the performance of different basis sets in combination with one and two 3s3p2d1f1g midbond centers, we now turn to a brief examination of their relative computational efficiency. For this purpose, we only consider the MP2 timings. As discussed above, aug-cc-pVDZ(+2*33211) provides somewhat better results than aug-cc-pVDZ(+33211) but the differences in errors are not dramatic. The largest difference for Δ rel abs is found for the dispersion group where including two midbond centers rather than one reduces the error from 6.3% to 4.0%. The largest Δ abs difference is found for the hydrogen group where including two midbond centers rather than one reduces the error from 0.489 to 0.258 kcal/mol. In 4.2 | Comparing the use of 3s3p2d1f1g and aug-cc-pVXZ as midbond functions In Section 4.1 we see that using two midbond centers for S66 is seen to reduce the error in the interaction energies, in particular for dispersion group. Since the midbond set 3s3p2d1f1g contains diffuse functions of high angular momentum, calculations using several centers with this set increase computational cost (see Section 4.1.1). In this section we explore results for using the correlation consistent aug-cc-pVXZ basis sets for carbon as midbond set (contracted composition [3s,2p,1d] + diffuse (1s,1p,1d) for aDZ and [4s,3p,2d,1f] + diffuse (1s,1p,1d,1f) for aTZ), to see if standard basis sets can replace the specialized 3s3p2d1f1g. We compare results generated using two centers of 3s3p2d1f1g (denoted +2*33211, see Tables S17 and S21) and two centers of carbon aug-cc-pVXZ (denoted +2*aXZ, see Tables S18 and S22), where X of the midbond basis follows X of the main basis set, as also was the case in Section 3. We show MP2 and CCSD(T) results using cc-pVDZ, cc-pVTZ and aug-cc-pVDZ atom centered basis sets in combination with 2*33211 and 2*aXZ as midbond functions. Results for Δ rel abs and Δ abs are presented in Table 4 and visualized in Figure 3. Although we mainly discuss errors in terms of Δ rel abs , we also list Δ abs results since equal Δ rel abs values may correspond to different Δ abs values (see Equations 5 and 6).
We first discuss the performance of the cc-pVDZ and aug-cc-pVDZ basis sets in combination with two 3s3p2d1f1g and two aug-cc-pVDZ functions as midbond functions. The MP2/cc-pVDZ results experience a large decrease in accuracy by replacing 2*3s3p2d1f1g functions with 2*aug-cc-pVDZ functions. This effect is especially significant for dispersion and others groups, where Δ rel abs is increased from 14.3% to 22.7% for dispersion and from 9.2% to 18.9% for others group. The same trend is observed for CCSD(T) with a Δ rel abs increasing from 16.3% to 26.4% for dispersion group and from 10.7% to 21.3% for others group. Thus, the cc-pVDZ (+2*aDZ) basis set seems to be an inadequate choice for description of noncovalent interactions. For the aug-cc-pVDZ results we see the same trend as for cc-pVDZ results, that is, that the accuracy for both MP2 and CCSD(T) results is reduced upon replacing the 3s3p2d1f1g midbond functions with the aug-cc-pVDZ functions. In general, the Δ rel abs errors T A B L E 5 Relative mean absolute errors (Δ rel abs , %) and mean absolute errors (Δ abs , kcal/mol) of interaction energies for S66 complexes divided into interaction type calculated at MP2 and CCSD(T) levels of theory using cc-pVXZ (+33211), X = D, T, and aug-cc-pVDZ (+33211) for systematic and intuitive placement of midbond centers  While it has been proposed that basis sets where the angular momentum of the midbond set is higher than for the atom centered basis set are imbalanced, 84 Patkowski and collaborators 69 argue that this imbalance is not necessarily bad. Our results support the statement of Patkowski and collaborators, because we see that functions of high angular momentum in the midbond set are important to obtain accurate interaction energies.
Therefore, aug-cc-pVDZ(+33211 or + 2*33211) stand out as good choices for computing of interaction energies of noncovalent complexes.

| Systematic versus intuitive placement
In this section we explore how much the quality of computed interaction energies depend on whether using a systematic approach (see  Tables S19, S20, S23, and S24). Δ rel abs and Δ abs are given in Table 5, and for illustrative purposes Δ rel abs are visualized in Figure 4. For cc-pVDZ(+33211) the smallest errors are found for the hydrogen group where Δ rel abs errors range between 11.1% and 14.7%. For the dispersion group the Δ rel abs errors range between 23.8% and 30.3%. Hence, the cc-pVDZ(+33211) results are generally unreliable, no matter which approach is used for placing the midbond centers.
For cc-pVTZ(+33211) the results are significantly better than the cc-pVDZ results, but the approach for how to place the midbond F I G U R E 4 Relative mean absolute errors (Δ rel abs , %) for the S66 data set obtained at MP2 and CCSD(T) levels of theory using cc-pVXZ (+33211), X = D, T, and aug-cc-pVDZ(+33211) systematic and intuitive placement of midbond centers. Interaction energy errors are divided into interaction types centers only influences the results to a small extent. For example, MP2/cc-pVTZ(+33211) using the systematic approach gives an Δ rel abs error of 4.6% for the hydrogen group whereas the intuitive placement gives an error of 5.0%. For CCSD(T) the Δ rel abs errors are 6.5% and 6.9% for systematic and intuitive approach, respectively, for the hydrogen group. For the dispersion group the MP2/cc-pVTZ(+33211) Δ rel abs errors are 8.6% and 8.9% for systematic and intuitive placement, respectively, and the CCSD(T) numbers are 12.7% and 13.2%. For aug-cc-pVDZ(+33211) we see the same trends as for the other basis sets, namely that there is small dependence on how the placement of midbond centers is determined. In general, although the systematic placement seems to consistently give smaller errors than the intuitive approach, the difference is very small compared to difference introduced by choice of basis set. Overall, the quality of the results attained with both methods does not differ greatly, and both approaches offer distinct advantages. The systematic approach can be beneficial for large data sets of molecules, where the process of locating midbond functions can be automated, whereas the intuitive approach is simple for single systems.

| CONCLUSIONS
In this article we use MP2 and CCSD(T) calculations for the A24 and S66 data sets to explore how midbond functions efficiently can be used to generate cost effective CP corrected supramolecular interaction energies of noncovalent complexes. We have used the A24 data set to show that the primary role of midbond centers is not to more rapidly reach the CBS limit for the dimer, but rather in providing a balanced description of the interaction region and the molecules. The need for balance is a consequence of choosing to use atom centered basis sets to describe the electronic structure. The error associated with the imbalanced description is seldom discussed in its own right, but neglected compared to, for example, the BSSE. If CBS limit is reached both errors with respect to an imbalanced description and BSSEs disappear, but for larger complexes it is not feasible to go to the basis set limit.
Further, we have used the S66 data set to explore how midbond functions affect interaction energies of larger molecular systems and whether requirements for large systems are different to those for small systems. We have studied whether it is beneficial to use more than one midbond center in the interaction region of the S66 complexes. However, we show that one midbond center combined with a basis set such as aug-cc-pVDZ yields cost effective results for the S66 data set. Increasing up to two midbond centers improves the results.
Results obtained using cc-pVDZ are significantly worse for the S66 data set than for the A24 data set. For small systems such as those in the A24 data set, the midbond functions are able to also improve the description of molecules, and hence reducing the requirement on size of the atom centered basis set. This yields reasonable results even for the cc-pVDZ basis set, as seen both in this paper and in literature. We show that for larger systems, illustrated by using the S66 data set, the role of the midbond functions is primarily to provide the balanced description, since complexes are too large to get their description improved by the midbond functions. Hence, the requirement on choice of atom centered basis set is somewhat stricter for larger complexes. However, including midbond functions in the calculations to improve the flexibility in the interaction region allows for good results to be produced using basis sets, such as aug-cc-pVDZ, which traditionally are deemed insufficient for computing interaction energies.
By comparing calculations using the 3s3p2d1f1g midbond set with the use of aug-cc-pVDZ and aug-cc-pVTZ basis sets as midbond sets, we see that the requirements for the midbond set to be effective, is not just that it contains diffuse functions, but also that high angular momentum functions (at least f-functions) are included.
Results for two ways of placing midbond centers show that interaction energies are not sensitive to exact placement as long as it is reasonable. One approach is based on a weighted average of intermolecular atom-atom midpoints (systematic) and the other one is based on the use of chemical intuition (intuitive), which each have their advantages. The systematic approach can be beneficial for large data sets of molecules, whereas the intuitive approach is simple when looking at a particular molecular system.