The versatility of the Cholesky decomposition in electronic structure theory

The resolution‐of‐the‐identity (RI) or density fitting (DF) approximation for the electron repulsion integrals (ERIs) has become a standard component of accelerated and reduced‐scaling implementations of first‐principles Gaussian‐type orbital electronic‐structure methods. The Cholesky decomposition (CD) of the ERIs has also become increasingly deployed across quantum chemistry packages in the last decade, even though its early applications were mostly limited to high‐accuracy methods such as coupled‐cluster theory and multiconfigurational approaches. Starting with a summary of the basic theory underpinning both the CD and RI/DF approximations, thus underlining the extremely close relation of the CD and RI/DF techniques, we provide a brief and largely chronological review of the evolution of the CD approach from its birth in 1977 to its current state. In addition to being a purely numerical procedure for handling ERIs, thus providing robust and computationally efficient approximations to the exact ERIs that have been found increasingly useful on modern computer platforms, CD also offers highly accurate approaches for generating auxiliary basis sets for the RI/DF approximation on the fly due to the deep mathematical connection between the two approaches. In this review, we aim to provide a concise reference of the main techniques employed in various CD approaches in electronic structure theory, to exemplify the connection between the CD and RI/DF approaches, and to clarify the state of the art to guide new implementations of CD approaches across electronic structure programs.

The development of ab initio methods over the last 40-50 years has focused on harnessing the elusive locality of quantum effects in molecular systems, as the proper use of this locality enables computational studies of much larger molecular systems with the available computer resources.Almost inevitably, one then faces the question of the compact representation of the electron repulsion integrals (ERIs), which can be achieved by rank-reducing factorizations, for instance.
In a seminal paper published in 1977, Beebe and Linderberg 1 suggested that the symmetric positive semi-definite matrix of the ERIs could be efficiently represented through the use of an incomplete Cholesky decomposition (CD), 2,3 whose accuracy is controlled with a single adjustable parameter.Studying electronic structure calculations in a basis of contracted Gaussian-type orbitals, they demonstrated that the ERIs can be transformed into the molecular orbital (MO) basis faster when the CD approach is employed than when the conventional ERI approach is employed.As many wave function theories such as Møller-Plesset perturbation theory are conventionally written in terms of MO integrals, and as the MO transformation step may dominate the computational cost, the advantages of faster integral transforms can be considerable in many contexts.
However, Beebe and Linderberg also predicted that the explicit use of the ERIs in ab initio methods would be "unnecessary," as algorithms could be redesigned to take advantage of the CD factorization of the ERIs; indeed, this is a key aspect of modern implementations of electronic structure algorithms employing the CD.Furthermore, Beebe and Linderberg foresaw that the CD procedure could be advantageous also for other types of positive definite matrices in MO theory, such as the one-and two-particle reduced density matrices; many applications of this type can be found in the literature, as well.
This focus article is a chronological account of the use of CD in ab initio theory for the efficient representation of ERIs as well as for other purposes.In addition to the CD, we will also discuss resolution-of-the-identity (RI) or densityfitting (DF) approaches.The two terms-RI and DF-are often used interchangeably in the literature (as will also be done in this work), although the latter is more often used in the context of computing classical Coulomb interactions between two densities.
As we will discuss in this work, there is an extremely close relation between the RI/DF and the CD approaches; in fact, as will be shown in Section 2.1, the CD of the ERIs can be viewed as RI/DF in an auxiliary basis of atomic orbital (AO) products chosen by the pivoted Cholesky decomposition procedure. 4,5Accompanied with this connection, our review will focus around the auxiliary basis sets used in the RI/DF approach, and we will especially review the automatic generation of auxiliary basis sets for RI/DF approach using the CD.Despite the kinship between the CD and RI/DF approaches, the considerable literature on the RI/DF technique will not be reviewed in detail in this work due to the present focus on CD techniques.
The layout of this work is as follows.Next, in Section 2, we will discuss the CD approach.The section begins with a presentation of the "vanilla" CD algorithm found in the mathematical literature.However, electronic structure applications typically employ CD for so-called Gram matrices, such as the overlap or Coulomb overlap matrix.As discussed in Section 2.1, the CD of such matrices can be viewed as a Gram-Schmidt orthogonalization of the associated basis functions, and this connection can be used to fashion more efficient CD algorithms.The review then continues with literature applications of the CD on ERI decompositions and other methods in Section 2.2.The RI/DF approach is briefly presented in Section 3, after which we review various approaches to build auxiliary basis sets for RI/DF in Section 3.1.The review ends with a brief conclusion in Section 4. This review can be supplemented with the more technically oriented and verbose report on the subject in Reference [6].A brief technical description of analytical gradients in association with CD is presented elsewhere. 7

| CHOLESKY DECOMPOSITIONS
The Cholesky decomposition 2 (CD) is given by where the Cholesky vectors L form a lower (or upper) triangular matrix.The CD can be constructed for any symmetric positive semi-definite matrix M; see Reference [8] and references therein for the mathematical background.
The Cholesky factorization was originally introduced in electronic structure theory by Beebe and Linderberg1 to decompose the electron repulsion integrals (ERIs).Assuming real AOs or MOs χ μ È É , the ERIs in chemists' also known as Mulliken notation are given by where μ, ν, κ, λ are the AO/MO indices.Importantly, the ERI tensor is positive semi-definite, [9][10][11][12] and can be viewed as a measure of the distance of the orbital products χ μ r ð Þχ ν r ð Þ and χ κ r 0 ð Þχ λ r 0 ð Þ in the Coulomb metric g r,r 0 ð Þ¼ r À r 0 j j À1 .Because of the positive semi-definiteness, we can apply Equation (1) to decompose the ERIs as However, as the real ERIs defined by Equation ( 2) satisfy symmetries with the interchange of the indices μ $ ν, κ $ λ, and μν $ κλ, one also has to require that the elements of the Cholesky vectors L n satisfy L n μν ¼ L n νμ , because χ μ χ ν and χ ν χ μ are the same orbital product.This symmetry is commonly employed to describe the pair μν with a single compound index, which we will denote by i. Choosing j iÞ ¼ j μ i ν i Þ with ν i ≤ μ i (or equivalently with μ i ≤ ν i ), the symmetry of L n μν ¼ L n νμ is thus explicitly built-in in the mapping μν $ i, and the length of the Cholesky vectors L n is decreased by roughly one half.Employing the compound indices, the ERI tensor μνjκλ ð Þ≔ V ð Þ μν,κλ can now be written as a matrix V ð Þ i,j , which is also positive semi-definite.We can then write the Cholesky decomposition in matrix (V ¼ LL T ) or component form (V i,j ¼ P N n¼1 L i,n L j,n ), where N is the size of the orbital product basis.In practice, the Cholesky vectors Þ T are computed one by one by an iterative procedure.As was already mentioned, a full mathematical analysis can be found in Reference [8]; in the following, we will only outline the main parts of the procedure.
The key to understanding the pivoted Cholesky algorithm is that it is controlled by the errors of the diagonal elements of the matrix to be decomposed (here V ).Because the whole procedure boils around the diagonal remainder, it is stored in memory for the whole procedure as a vector d.
At the beginning, the error vector is simply given by the diagonal of V , , that is, by Importantly, it is easy to see that the residual matrix for any n; this also means that d i ≥ 0 in the absence of errors due to finite precision.As the sum of the diagonal errors P i d i measures the error of the current Cholesky approximation, an optimally convergent algorithm that achieves maximal error reduction per iteration is obtained by zeroing out the largest error in d at every iteration. 8This is achieved by sorting d in descending order, and choosing the entry corresponding to the largest (so far unprocessed) entry as the pivot index, that is, as the next Cholesky vector to generate.
What does the algorithm for the generation of the Cholesky vectors look like?In the following, we try to maximize the similarity of the notation with Harbrecht et al. 8 for clarity.However, as we wish to generate our Cholesky vectors as columns, we define our Cholesky matrix as the transpose of those of Harbrecht et al., L ¼ ℓ T .
Following Harbrecht et al., let π denote the row/column indices of the pivoting, so that π i is the index of the ith pivot function, which denotes the orbital product μ i ν i .The computation of the Cholesky vector for iteration m 1, N ½ starts out by setting the diagonal element 1 ; with the pivoting this reads 8 When the Cholesky vector defined by Equations ( 5) and ( 6) has been computed, the diagonal error is updated by Equation (4).One can especially observe from Equations ( 4) and ( 5) that the resulting d π m ¼ 0, that is, the largest error is zeroed out at every iteration, as was already mentioned above.This is the reason for the optimal convergence of the CD algorithm.
After each iteration, the remaining error in the decomposition at iteration m can be computed from The procedure of Harbrecht et al. 8 is stopped when ϵ reaches a predefined tolerance threshold τ, ϵ < τ.In contrast, quantum chemical implementations of the Cholesky decomposition commonly employ a different termination criterion that is defined by the maximum element of the error vector Clearly, both stopping criteria of Equations ( 7) and ( 8) lead to the exact decomposition at the limit τ !0. However, the former will lead to larger numbers of Cholesky vectors for a fixed value of τ than the latter criterion.
Having laid out the algorithmic implementation of the procedure, we can already make some general observations about the method.Perhaps the defining property of the Cholesky decomposition behind its popularity in electronic structure theory is exactly that the iterative procedure can be formulated in a "direct" 13 manner, such that only the diagonal elements d of the supermatrix V and the individual Cholesky vectors L n ð Þ need to be stored.The storage cost is therefore essentially determined by that of the Cholesky vectors, themselves.However, because the Cholesky decomposition efficiently picks out any linear dependencies in the contents of the decomposed matrix-which are formidable in the case of the orbital product basis used in the ERI tensor-the number of resulting Cholesky vectors is often much smaller than the dimensions of the original matrix, and in practice the number of Cholesky vectors N CD scales linearly with the number of orbital basis functions N OBF as N CD ¼ γN OBF with typical values of γ 3, 10 ½ , depending entirely on the employed decomposition threshold τ.
We conclude this section by noting that Beebe and Linderberg speculated that a proper implementation of the CD approach would make the computation of the integrals scale cubically with system size and the integral transformation scale quartically with respect to the size of the orbital basis set.This has since been verified by a number of efficient implementations, which brings us naturally over to the next subsection, which discusses the way in which the CD of the ERIs is computed in modern implementations.

| Connection to orthogonalization methods
An important connection between the Cholesky decomposition discussed above in the general case and matrix orthogonalization methods was originally discussed by Aquilante et al. 4,5 Namely, when applied to a Gram matrix, that is, an overlap matrix in the usual metric g r, r 0 ð Þ¼δ r À r 0 ð Þ or a Coulomb overlap matrix in the case of the Coulomb metric g r, r 0 ð Þ¼ r À r 0 j j À1 employed in the ERIs, the CD can in fact be seen to be equivalent to a Gram-Schmidt orthogonalization procedure.This connection is therefore not a general property of the CD, but a consequence of the context of its application to matrices that describe overlaps of functions in some function space.Due to its practical importance, we will devote this subsection to elucidating this connection originally pointed out in References [4,5] by rederiving it from the general equations shown above for the CD.
The argument of Aquilante et al. 4,5 is as follows.Let us assume that we have performed the Cholesky decomposition of the ERIs, which has yielded us a series of pivot functions j π j Þ ¼ j μ j ν j Þ.Let us now examine what happens if we apply the Gram-Schmidt orthogonalization technique to this set of pivot functions.The Gram-Schmidt procedure determines orthonormalized functions j Q j Þ that are obtained by first projecting out previously processed vectors from the original functions and then by normalizing the remainder It is now straightforward to make the connection that 4,5 First, it is obvious from Equations ( 5) and ( 6) that the first function generated by the Gram-Schmidt procedure satisfies μνjQ , as the first orthonormalized function is simply the first pivot function in normalized form. 4Having established that Equation (11) holds for the first function, we can prove the connection in general by the use of induction.
Computing μνj e Q m from Equations ( 9) and (10) we have On the one hand, we can rewrite Equation (13) using the claim of Equation (11) as On the other hand, Equation (6) reads in the same notation as The right hand side of Equation ( 14) is clearly equal to the parenthesis Equation (15).But Equation (15) still contains the "diagonal" element, and to complete the proof we need to show that this factor is the normalization integral in Equation (10).This is also easy to show: according to Equation (9) the normalization integral in Equation ( 12) is where we have again used Equation (11) to arrive from Equation (16) to Equation (17), and then applied Equation ( 4) to identify the diagonal error in V À LL T at iteration m, which is the "diagonal" element of the Cholesky vector according to Equation (5) in Equation ( 18), q.e.d.We have thus reproduced the proof of References [4,5] that when applied to an overlap (a.k.a.Gram) matrix, the pivoted Cholesky decomposition is analogous to performing Gram-Schmidt orthogonalization on the sequence of pivot functions chosen by the CD.Why did we go through this trouble?This connection is of immense practical use.Not only does it give an intuitive understanding of what the Cholesky decomposition does when applied to the ERIs, but also it can be used to greatly simplify the computation of the Cholesky decomposition-as well as that of its derivatives.One can especially see that the Cholesky decomposition of the ERI is the same as RI/DF in the basis of the orbital products represented by the pivot indices; this will be discussed in detail in Section 3.
Realizing that the Cholesky decomposition can be performed in two steps-(i) identification of the pivot functions and (ii) formation of the Cholesky vectors-allows the use of a different type of algorithm than the single-step approach presented in Section 2. As we will discuss in the literature review in Section 2.2, modern CD approaches rely on such two-step procedures, 6 which afford the following advantages.
When Equation ( 8) is employed as the stopping criterion, the first step can be optimized by throwing out any orbital products i with diagonal errors d i < τ at every iteration.Because the orbital product basis contains many linear dependencies, the number of significant elements in the d vector decreases rapidly, meaning that the consequent pivot indices become faster and faster to determine.Once the pivot indices have been determined, the Cholesky vectors can be efficiently computed from orthogonalized three-center integrals as in the RI/DF technique, which is discussed in Section 3.This also obviates the need to keep the Cholesky vectors in memory, as the vectors can be computed from the pivot indices on-the-fly following established RI/DF methodologies.

| Literature review
What follows will primarily be a review on developments on the use of CD technology in electronic structure theory.Key contributions specific to RI/DF are also mentioned when relevant to the presentation.
A first necessary step for chemical applications of the CD procedure was to find a way to compute nuclear gradients.Initially, O'Neal and Simons 14 proposed augmenting the ERI matrix with the derivative integrals before carrying out the CD procedure.However, this limits the accuracy of the computed gradients to the Cholesky threshold, requiring the use of extremely small decomposition thresholds τ, which is clearly unsatisfactory.We will shortly discuss later developments that allow exact gradients for CDs determined with rather generous thresholds.
Further development of the CD approach was slow during the 1980s and 1990s. 14,15In the meantime, the development of the RI/DF approximation 16 to ERIs offered an attractive and simple way around the somewhat awkward and not too efficient recursive nature of the original version of the CD procedure discussed in Section 2: the RI/DF approach discussed in Section 3 determines a reduced rank representation of the orbital product basis set by the introduction of an auxiliary basis set.The expansion coefficients of the ERIs in the auxiliary basis can be determined variationally, but the strict control over the accuracy of the resulting approach is lost in RI/DF, as it employs a fixed auxiliary basis set instead of choosing the auxiliary functions adaptively as in CD (Section 2.1).However, the loss of guaranteed precision is not an issue for many standard chemical applications, as discussed by Weigend et al., 17 for instance.
In 2003, Koch et al. 18 developed the first modern CD implementation for Hartree-Fock (HF) and second-order Møller-Plesset (MP2) theories in the Dalton software package, 19 following the original procedure of Section 2. In the following year, they reported implementations for computing dynamic polarizabilites at the approximate coupledcluster singles-and-doubles (CC2) level of theory. 20Both implementations demonstrated significant reductions in processor time compared with the use of conventional ERI techniques.
These efforts inspired the MolCAS 22 developers to implement and investigate the CD approach, as well.First, Aquilante et al. 21developed the local-exchange (LK) method for fast evaluation of the exchange contribution to the Fock matrix (see Figure 1), the direct Coulomb contribution being inexpensive and straightforward.Implementations of CD-based scaled-opposite-spin MP2 theory, 23 complete active space self-consistent field (CASSCF) 24 theory, and multiconfigurational second-order perturbation theory (CASPT2) 25 followed in a rapid sequence.
The derivative issue was also addressed and solved in a different way from the original approach of O'Neal and Simons 14 : Aquilante et al. 4 argued that the proper derivatives are not generated by CD applied to the derivative integrals but rather by differentiating the Cholesky approximation itself, Equation (3).However, the recursive nature of the Cholesky basis is a significant complication in such an implementation.Yet, as discussed in Section 2.1, Aquilante et al. showed that Equation (3) could equally well be expressed as where π m f g again denote orbital products μ m ν m chosen by the CD as pivot indices, and the Coulomb matrix V is defined to have the matrix elements The key discovery here is that the derivatives of Equation ( 19) are straightforward to compute with minor modifications of the algorithms developed for RI/DF techniques discussed in Section 3, obviating the need to take the derivatives of the iterative CD procedure discussed in Section 2.
Similarly motivated by the efforts by Koch et al. and the MolCAS developers, Røeggen and Johansen 26 suggested an efficient CD algorithm for use with family type basis sets that employ the same Gaussian exponents for various angular momenta.
As suggested by Beebe and Linderberg, 1 the CD procedure also turns out to be useful for other quantities than the ERIs.For instance, Aquilante et al. 27 used CD to generate localized MOs in a non-iterative fashion from the one-particle density matrix in the AO basis, while Boman et al. 28 applied the CD approach in a brilliant work to the matrices occurring in the energy expressions for a given method.For example, the direct Coulomb contribution to the energy in HF, CASSCF, and Kohn-Sham density-functional theory (KS-DFT) can be written as where the CD procedure can now be applied to the matrix M that describes products of density matrix elements D μν with the corresponding AOs χ μ χ ν which together describe the electron density participating in the Coulomb interaction; this yields the Coulomb decomposition.Boman et al. also reported three versions adapted for the HF exchange energy, which differ in the ways the contributions are screened.Overall, they found that the length of the CD can be reduced Comparison of the scaling of the CPU time for the exchange Fock matrix build for linear alkanes using standard Cholesky, the local exchange (LK) Cholesky, and the conventional integral-direct methods using a CD threshold of τ ¼ 10 À4 E h and the cc-pVTZ basis set.(Reprinted from Reference [21] with permission from AIP Publishing.) by an order of magnitude or more compared with the standard CD applied to the ERIs, and that the associated reductions in computational time may reach two orders of magnitude. 28nother distinct application of the CD procedure is by Lehtola, 29 who followed the techniques discussed later in Section 3 to cure overcompleteness in molecular electronic structure calculations with AO basis sets.Even when the AO basis is orthonormal on a single atom, the overlap matrix may develop pathological singularities from interatomic blocks that make electronic structure calculations numerically ill-conditioned.However, in analogy to the methods discussed in Section 3, the CD approach can be used to identify subsets of orbital basis functions that are able to span the whole set of atomic-orbital basis functions 29 ; per the discussion in Section 2.1, the method of Reference [29] can be seen as an optimal Gram-Schmidt orthogonalization of the AO basis.This automated procedure to remove pathological linear dependencies enables the routine use of extremely diffuse basis functions to describe weakly bound electrons, 29 extended basis sets for benchmark quality calculations, 30 as well as studying highly repulsive parts of potential energy surfaces where nuclei may even be on top of each other. 31he developments of the CD method for ERIs over the last 10-15 years have included implementations of a wide range of ab initio methods, including Green's function methods, 32 an adaptive metric for fragment molecular orbital MP2, 33 equation-of-motion MP2, 34 and second-order symmetry-adapted perturbation theory. 357][38][39][40] Carter et al. have used the procedure in linear-scaling and localized multireference configuration interaction (MRCI) theory, 41,42 while Lehtola et al. have used it for studies on the Perdew-Zunger self-interaction correction 43,44 and the perfect pairing hierarchy, 45 which is a family of truncated coupled-cluster (CC) methods.Other types of CC theories have seen use of Cholesky decomposed ERIs by others.An extensive list includes reports on implementations with parallel CC singles-and-doubles with perturbative triples corrections (CCSD(T)), 46 for the evaluation of equation-of-motion CC (EOM-CC), 47 in connection with efficient CC singles (CCS) and CC singles-and-doubles (CCSD) implementations on graphics processing units (GPUs), 48 a combination of reverse Cuthill-McKee transformation in combination with CD for compact two-electron integral representation in combination with MP2 and linear coupled-cluster with doubles l-DCC, 49 for the evaluation of analytical gradients for the CCSD and EOM-CCSD methods using the explicit derivatives of the iteratively generated Cholesky vectors, 50 for the CCS, CCSD, and CCSDT models combining CD representation of the ERIs with tensor-reduction schemes for the triples CC amplitudes, 51 and most recently for CCSD gradients 52 using the approach of Aquilante et al. 4 to compute the derivative integrals in the parent basis set.
To continue, Bozkaya et al. have implemented over the last few years in a collective effort CD and RI/DF for various orbital-optimized schemes (the OMP2, OMP2.5, and OCCSD methods), [53][54][55] quasi-degenerate perturbation theory, 56 as well as energies and analytic gradients for OCCSD, and EOM-CCSD. 57,58These CD approaches are available 35 in the Psi4 program package. 59Similarly, Hohenstein et al. have presented impressive implementations of CD ERI methods in connection with reduced-rank CC theory on GPUs, 60 combined with tensor hypercontraction of the doubles amplitudes. 61n this list of implementations of CD ERIs in ab initio methods one also finds a variational orbital-optimized twoparticle reduced-density-matrix method, 62 a self-consistent-field valence bond (VBSCF) method, 63 and an implementation in density matrix renormalization group (DMRG) second-order N-electron valence state perturbation theory (NEVPT2). 64Piccardo and Soncini 65 have also formulated a modified version of the procedure that can be applied to both the symmetric ERI matrix and the antisymmetric matrix of the two-electron spin-orbit integrals.
The recently developed e T software package 66 for CC methods has been designed to exclusively employ the CD representation of the ERIs.Folkestad et al. 67 presented a highly efficient implementation of the two-step CD procedure of Aquilante et al. 6 that only stores the pivot indices from which the integrals can be computed as needed.Folkestad et al. 67 have used this procedure in e T in CCSD calculations on systems with up to 80,000 AO basis functions, see Figure 2.
In parallel with the work of Folkestad et al., Lew-Yee et al. 68 reported two algorithms on the combination of asymmetric DF together with CD ERIs in a second-order propagator implementation.Most recently, Zhang et al. 69 reported a careful analysis of the computational costs of the two-step procedure, and determined the optimal way to implement CD with respect to memory footprint and number of floating-point operations.
The CD procedure is now starting to be incorporated to its full extent also in other program packages than DALTON, 70 OPENMOLCAS, 71 e T , 66 and PSI4. 59Helmich-Paris et al. 72 described the implementation of relativistic Choleskydecomposed density matrix (CDD) second-order Møller-Plesset perturbation theory (MP2) energies in the Dirac program. 73Nottoli et al. 74 have recently reported several implementations using CD ERIs in the CFOUR software package, 75 while Blaschke and Stopkowicz 70 presented an extension of the CD technique for complex ERIs arising from the use of London orbitals. 76Employing these techniques, Nottoli et al. have reported a quadratically convergent SCF (QCSCF) implementation, 74 a second-order multi-configuration SCF (MCSCF) implementation 77 that yields a significant reduction in the computational cost and memory requirements and improves parallel performance, and studied nuclear magnetic resonance shielding constants and molecules in strong magnetic fields. 78,79Finally, extremely recently, the application of CD and lower-upper (LU) techniques to a spin-free infinite-order two-component relativistic Hamiltonian in a modified version of the GAMESS program has been discussed by Takashima and Nakai. 80

| RI/DF APPROACH
Could the ERIs be computed more efficiently using an intermediary basis set?][83][84][85][86][87][88] The central idea in the RI/DF approach is that the orbital product basis set can be expanded in terms of a smaller auxiliary basis set as The expansion of Equation ( 21) was originally investigated by Whitten 11 as a way to get an efficient representation of the electron densities when computing Coulomb potential energies.The expansion coefficients C K μν are obtained by maximizing the fit between the input density in the orbital product basis and the auxiliary basis expansion.Vahtras et al. 89 studied such fits determined within the Coulomb and overlap metrics, and found the former to yield the most accurate results.This is easily understood, since fitting in the Coulomb metric minimizes the error in the interaction energy, Hence, we find the following relation at the minimum which yields where K and L are indices used as a shorthand for the auxiliary basis functions θ K and θ L , and μνjK ð Þand V KL ¼ KjL ð Þ are the so-called 3-and 2-center ERIs, respectively, which are evaluated over the orbital and auxiliary basis functions.For a recent comparison on the accuracy of various fitting metrics, we refer to the work of Duchemin et al. 90 The approximation in Equation ( 21) results in the following approximation for the ERIs: When the coefficients are determined by the Coulomb fitting procedure of Equation ( 24), Equation ( 25) yields This expression is now seen to be identical to Equation (19): in Equation ( 19), the pivot functions chosen by the CD take on the role of the chosen auxiliary basis in Equation ( 26), which enables the reuse of RI/DF techniques within the two-step CD scheme as discussed in Section 2.1.
The connection to the CD technique also allows us to see that the maximum absolute error in the approximate ERIs of the RI/DF procedure is bounded from above by max μν,κλ j ϵ μν,κλ j, whose value is now determined by the chosen auxiliary basis set.This error can be made small by a judicious choice of the auxiliary basis set: an extended auxiliary basis set will afford small errors.
It has also been recently pointed out that a one-center correction to the RI/DF procedure in which all one-center two-electron integrals are computed exactly can afford reduced RI/DF errors. 91

| Auxiliary basis sets
We will now review efforts to build accurate auxiliary basis sets along two prongs.The first is to optimize the auxiliary basis sets for a specific purpose, such as for the cost-efficient calculation of the direct Coulomb contribution to the Fock matrix, which is of immense practical importance in standard chemical applications.The second is to try to build generaluse basis sets by some automated means, starting from the given orbital basis set.In particular, the CD technique will be seen to guarantee small errors by construction, yielding a universal approximation of the ERIs with controlled accuracy.
Once the considerable benefits of the RI/DF approach were discovered, researchers took on the quest of finding optimal auxiliary basis sets.Vahtras et al. 89 noted that although an exact θ K in Equation ( 21) can be found with the Gaussian product theorem, this would not result in any reduction in the required computational effort.In any case, a successful approximation should reproduce the HF energies with suitably small errors, which requires reproducing the correct behavior of isolated atoms, for instance.Hence, Vahtras et al. 89 stated that "the expansion basis set should in principle include all one-center products of the original basis set."The race was now on to find such auxiliary basis sets.
One way to derive such precomputed auxiliary basis sets was proposed by Ten-no and Iwata.Following Vahtras et al., the linear combination of atomic electron densities (LCAD) approach 92,93 chooses all single-center orbital products as auxiliary functions.A slight complication of this choice are the angular redundancies (ARs) due to duplicate orbital products of the same total angular momentum, and the Coulombic redundancies (CRs) due to the overcompleteness of the set of orbital products, which are removed in the procedure. 92,93Ten-no and Iwata 93 demonstrated significant speedups for multiconfigurational calculations with the LCAD scheme with negligible errors ($ 0:1 kcal mol À1 ).
The group around Ahlrichs and the TURBOMOLE program 94 did solid work along a different route and produced a preoptimized auxiliary basis set for the standard orbital basis sets in Turbomole at the time, which cover the chemically relevant part (H-Rn) of the periodic table.This RI-J auxiliary basis set was designed to reproduce the direct Coulomb contribution J to the Fock matrix, 95,96 significantly speeding up density functional calculations with semilocal functionals.The auxiliary basis was optimized by minimizing the error in the Coulomb energy of isolated atoms and hydrides with a Monte Carlo procedure.The resulting error was found to be around 0:1 mE h per atom in the early generations of auxiliary basis sets.Equally importantly, the auxiliary basis set was found to be about three times the size of the orbital basis set, N aux ≈ 3N OBF .
The work was extended to the evaluation of the MP2 correlation energy by numerical optimization of exponents and contraction coefficients using an error metric defined by the direct MP2 energy correction, 97 yielding RI-MP2 auxiliary basis sets.The number of primitives and the contraction pattern of the auxiliary basis were determined based on ad hoc rules.An error of 0:06 mE h per atom was achieved with a slightly larger auxiliary basis set than the RI-J basis set, N aux ≈ 4N OBF .
It is worth mentioning that the optimization procedures for the RI-J and RI-MP2 auxiliary basis sets employed error metrics associated with total energies, which means that the errors in the values of the individual ERIs was not considered.Reproducing all ERIs generally requires many more auxiliary functions than reproducing just the Coulomb or the MP2 correlation energy.Still, it should be noted that these RI implementations for DFT and MP2 made the Turbomole package one of the few packages at the time that could routinely simulate molecules of a decent size with reasonably sized AO basis sets.
6][107] However, it is worth noting that no precomputed auxiliary basis sets had been reported for, for example, multiconfigurational electronic structure models at this time (2008).This was possibly a manifestation of the somewhat tedious and ad hoc way in which the preoptimized auxiliary basis sets are constructed: if you had just derived a new orbital basis set or designed a new electronic structure model, you were at best on a waiting list to get the associated auxiliary basis set(s), or you had to optimize them yourself.Furthermore, optimized auxiliary basis sets are almost invariably tailored for reproducing differences in ground state total energies, and may not be accurate for modeling all kinds of molecular properties such as magnetic or core spectroscopies, highlighting the need for alternate avenues.
The first attempt to automate the construction of RI-J auxiliary basis sets was reported by Yang et al., 108 who developed an ad hoc scheme with 11 steps: the auto-ABS procedure that is controlled by three adjustable parameters.Auto-ABS produces an uncontracted auxiliary basis and employs each exponent over an angular momentum range specific to that exponent.This algorithm was implemented in the Gaussian program and its accuracy in the reproduction of the direct Coulomb energy was compared with that of the so-called "universal" auxiliary functions of Weigend. 109Yang et al. 108 reported that Auto-ABS reproduces an accuracy similar to that of the auxiliary basis set of Weigend, even though the auto-ABS auxiliary basis can be as much as four times larger than that of Weigend.
When Aquilante and Pedersen worked on implementing the CD method for CASSCF and CASPT2 energy calculations, and Lindh on implementing RI/DF technology for the HF and density functional methods-all three working within the MolCAS package-they started to see similarities between their formulations that were beyond pure coincidence.This should not have been a surprise to anyone, as both techniques are realizations of the inner projections famously explored by Löwdin, 110,111 as was also elaborated by Vahtras et al. in the educational paper on integral approximations in HF calculations. 89gain, as discussed in Sections 2.1 and 3, the only difference between CD and RI/DF ERIs is the choice of the auxiliary basis set.The RI/DF approximation of the ERI can be written in the same form as the CD representation by symmetric splitting of the inverse overlap matrix yielding an expression similar to Equation (3), even when the two-center Coulomb overlap matrix KjL ð Þ contains linear dependencies that are removed.Thus, the difference between RI/DF and CD is just which set of functions is used as the auxiliary basis: the pretabulated auxiliary basis functions in RI/DF versus the orbital product functions chosen automatically by the CD procedure.Modern implementations of either method work by forming the two-center overlap KjL ð Þ and orthogonalizing it by, for example, Löwdin's canonical orthogonalization method, 112,113 while pathological linear dependencies in the auxiliary basis set can be handled again with CD approaches. 29,114onsidering the remarkable success and accuracy of the RI/DF procedure employing a one-center auxiliary basis set, one can ask if the one-center constraint also works within the context of the CD decomposition of the ERIs.The answer to this question is a resounding yes: not only is the resulting one-center CD (1C-CD) 115 procedure almost as accurate as the full four-center CD, but also it is about four times faster than full CD.Although a major improvement, the 1C-CD approach still has some potential flaws.It still requires a costly iterative procedure employing the procedure of Section 2 or Section 2.1.Also, even though the auxiliary basis now only contains one-center product functions, the auxiliary basis can change with the molecular geometry, which may result in discontinuities in potential energy surfaces.
Both of these problems can be easily fixed by applying the CD procedure on individual atomic blocks of the ERI supermatrix, thus defining the atomic CD (aCD) procedure. 115Again, a significant reduction in timings is observed compared with full-CD or 1C-CD without a significant loss of accuracy.Of course, neither 1C-CD or aCD provide global ERI error control any more, as only the errors of the one-center ERIs singled out by Vahtras et al. 89 are controllable.
It is prudent to compare aCD with the LCAD approach of Ten-no and Iwata. 92,93The 1C-CD, aCD, and LCAD approaches all work in the original orbital product basis set, but with different selection of the threshold for elimination of (near-)linear dependence.The former two methods typically use a threshold which is four orders of magnitude more loose.
Another important fact to stress is that in contrast to the standard RI/DF auxiliary basis sets that are typically optimized to reproduce total energies for a given method, the full-CD, 1C-CD, LCAD, and aCD families of auxiliary basis sets approximate the integrals themselves, meaning that no accidental cancellation of errors is relied upon for the accuracy of the resulting approach, see Figure 3.
However, the three approaches still share an exhaustive list of primitive orbital products, while standard RI/DF basis sets are no more dense than ordinary orbital basis sets.There must be more redundancy that can be removed!Aquilante et al. 116 demonstrated that if an additional CD procedure is applied to an atomic ERI list, now expressed in terms of the products of uncontracted, primitive Gaussians instead of products of AOs of contracted Gaussians, significant reductions can indeed be achieved in the primitive Gaussian approximation of the product basis.The resulting atomic compact CD (acCD) auxiliary basis set contains far fewer functions than the original aCD auxiliary basis set, as seen in Figure 4.
Although the acCD auxiliary basis sets are larger than the standard TURBOMOLE auxiliary basis sets, for instance, the acCD basis sets come with several advantages.The acCD basis set can be derived on the fly from any orbital basis set, eliminating the need for pretabulation by a painstaking optimization procedure for each combination of orbital basis, method, and targeted property.The acCD basis is a general-purpose auxiliary basis in that it can be used for any ab initio wave function or density functional model with similar accuracy, and it can also be expected to work well for various molecular properties.Moreover, the accuracy of the approximation is controllable with a single threshold, τ.In short, the CD approach, whether atomic, one-center, or full, can be viewed as a systematic approximation of the twoelectron interaction terms of the second-quantized Hamiltonian.
A calibration study of full-CD, 1C-CD, aCD, and acCD thresholds with respect to HF, non-hybrid, and hybrid density functional theory, and MP2 total energies demonstrated satisfactorily small errors of 0:01 kcal mol À1 per electron in conjunction with a CD threshold of τ ¼ 10 À4 E h , 5 compared with the use of the exact ERIs.The same threshold reproduced errors for CASSCF and CASPT2 valence and Rydberg excitation energies of 0.001 eV, 117 again compared with the use of exact ERIs.MP2, CCSD, and CCSD(T) interaction energies of weakly bound dimers are found to exhibit errors in the range 0.003 to 0:03 kcal mol À1 compared with the use of conventional ERIs. 118Furthermore, the same benchmark studies show that the error arising from the RI/DF procedure with the aCD and acCD auxiliary basis sets is virtually insignificant for high-quality orbital basis sets.For multiconfigurational methods, such as the CASSCF and CASPT2 models, the use of the acCD auxiliary basis set effectively eliminated the relative timing bottleneck due to the treatment of the ERIs.In a 2009 comparative study of 1C-CD, aCD, and acCD approaches versus the Turbomole RI/DF auxiliary basis sets, Weigend et al. 17 noted that the latter auxiliary basis sets are slimmer but still reproduce energies within acceptable error bars.They stated that the acCD auxiliary basis sets could possibly play a role in calculations with larger basis sets.
Over the last 15 years a number of new auxiliary basis sets have been developed and published.Hill et al. have developed RI-MP2 auxiliary basis sets for correlation-consistent basis sets for elements beginning from scandium, [119][120][121] while additional auxiliary basis sets for other types of orbital basis sets (weighted core-valence and ECP basis sets, 122 the 6-31G** and 6-311G** basis sets of Pople et al., 123 lanthanide basis sets, 124 Turbomole basis sets with diffuse functions, 125 etc.) have also been developed.Auxiliary basis sets have also been developed for F12 calculations.Shaw and Hill 126 reported improved complementary auxiliary basis sets for reaching the HF limit (OptRI+) in F12 calculations, while Kritikou and Hill 127 reported auxiliary basis sets optimized for reproducing correlation energies computed with F12 second-order Møller-Plesset perturbation theory.
As is evident from this tale, the selection and accuracy of auxiliary basis sets has inherited the complexity and confusion of orbital basis sets.Corrections to auxiliary basis sets for the correlation consistent cc-pV(n+d)Z orbital basis sets have been published as recently as 2023 by Nash et al. 128 As the literature lacks systematic version control, it can be hard to keep track of the development of orbital basis sets and the corresponding auxiliary basis sets.An example is the recent study of Kermani and Truhlar, 129 which highlights the ill-known fact that a given orbital basis set such as cc-pVDZ may not be the same in all programs.
Over the last few years researchers have been trying to address this issue by coming up with a simple recipe for generating an accurate auxiliary basis set for the given orbital basis set.Building on the tradition of employing product basis sets in the preceding computational physics literature, Ren et al. 130 proposed a procedure that forms auxiliary basis sets by forming all possible products and choosing the auxiliary basis functions by Gram-Schmidt orthogonalization.Note that if optimal pivoting is used, according to Section 2.1 this procedure is equivalent to the use of a pivoted Cholesky decomposition.Importantly, the procedure of Ren et al. is independent of the form of the atomic basis set; their implementation used numerical atomic orbitals, which were used to express Gaussian-type orbital basis sets.
Stoychev et al. 131 reported the AutoAux procedure, which is a set of 9 ad hoc rules controlled by 19 adjustable parameters to generate auxiliary basis sets for Gaussian basis sets.The AutoAux basis sets are usually twice as large as standard precomputed auxiliary basis sets but Stoychev et al. claim these to be general-purpose basis sets.Laikov 132 similarly proposed a method based on ad hoc rules of thumb for optimizing an auxiliary basis set for a given Gaussian orbital basis, which likewise relies on a number of adjustable parameters that control the composition of the resulting auxiliary basis.Semidalas and Martin 133 reported an automatic ad hoc procedure to generate complementary auxiliary basis functions for explicitly correlated F12 wave calculations with Gaussian basis sets following an eight-point scheme, which yields an auxiliary basis set with similar quality to the OptRI+ basis set.
On another path, following the footsteps of Aquilante et al., 116 Lehtola 114 proposed an atomic CD procedure that employs a pivoted Cholesky decomposition to pick out auxiliary atom-centered basis functions of the standard form composed of a radial function times a spherical harmonic.At variance to the aCD and acCD procedures that employ combinations of Cartesian functions and spherical harmonics in the auxiliary basis that are not supported by most quantum chemistry programs or basis set formats, the procedure of Lehtola produces RI/DF basis sets of the standard form that can be employed without changes in any program that implements RI/DF calculations while simultaneously maintaining the central property of atomic CD of being controlled by a single accuracy threshold.Lehtola's procedure, based on a pivoted Cholesky decomposition, is similar to that of Ren et al. 130 ; see discussion above.Moreover, both schemes are similarly applicable to any type of atomic orbital basis set.However, Ren et al. employ the full product orbital basis, while Lehtola preselects the trial products by a pivoted Cholesky decomposition of the ERIs following Aquilante et al.Lehtola compared the new procedure to the AutoAux method of Stoychev et al. 131 and found AutoAux to result in RI/DF errors up to 5 meV in MP2 total energies while the new Cholesky method afforded errors an order of magnitude smaller.However, the generated Cholesky basis sets were found to be significantly larger than those generated by AutoAux.
In extremely recent work Lehtola 134 discusses a method of contracting the autogenerated Cholesky basis set by the use of a singular value decomposition (SVD) of the three-center integrals, following a previous suggestion of K allay. 135ombining the contraction with a pruning of the high-angular momentum functions, Lehtola 134 finds that the resulting procedure allows the size of the auxiliary basis to be reduced significantly: HF and MP2 total and atomization energies can be reproduced with some 50% fewer auxiliary basis functions with the new scheme.Lehtola finds that the accuracy of the full auxiliary basis can be captured with N aux ≈ 5:5N OBF , while an accuracy similar to AutoAux is achievable with N aux ≈ 3:5N OBF .
Not minding the complication with the mixing of Cartesians and spherical functions in the aCD basis, Hellmann et al. 136 applied aCD with only spherical functions to integrals of the long-range part of the range-separated Coulomb operator in time-dependent density functional theory (TD-DFT).Hellmann et al. 136 observed that the aCD basis set was consistently smaller than the recommended general-purpose Coulomb RI basis set by up to one order of magnitude for some molecular systems, without any loss of accuracy to excitation energies.Single-point energies were, however, found to be prone to the errors introduced by the lacking handling of the mixed Cartesian/spherical auxiliary functions used in the aCD scheme, in agreement to the findings of Lehtola. 114However, it is interesting to compare the findings of Hellmann et al. to the recent work of Zhou et al., 137 who found that TDDFT excitation energies of organic molecules can be captured to 0.06 eV error with just a single(!) s-type auxiliary basis function per atom.This suggests that TDDFT calculations are surprisingly insensitive to the accuracy of the auxiliary basis set.
Before concluding this section a word of caution is in order though.Auxiliary basis sets autogenerated from small orbital basis sets will lack high-angular-momentum functions which are necessary to describe products arising from functions on two atoms.As a rule of thumb, the one-center approximations involved in LCAD, 1C-CD and the automated methods for generating auxiliary basis sets discussed above require at least a polarized triple-ζ orbital basis set to afford auxiliary basis sets that are flexible enough to describe two-center orbital products that otherwise lead to large RI/DF errors.It is also known that the high-angular-momentum functions contained in the orbital product basis generated by first-principles automatic algorithms are important for controlling the error in local fitting and, therefore, should not be eliminated. 138Local fitting is often utilized to accelerate reduced-scaling or linear-scaling algorithms, and accuracy can be improved with the use of robust and variational fitting which corrects the first-order error made in the fit. 139,140Robust and variational fitting replaces Equation (25) with It is easy to see that this approximation is quadratic in the fitting error, 141 and that it thereby allows flexibility for choosing the fitting coefficients C, including using other metrics than the Coulomb operator.Equation ( 28) reduces to the traditional expression of Equation ( 26) when the coefficients are determined by Coulomb fitting in the full auxiliary basis with Equation (24).
Robust local fitting, however, breaks the positive-definiteness of the ERI tensor and, hence, may lead to attractive electron-electron interactions.Moreover, the Hamiltonian may become unbounded from below, leading to variational collapse.This problem was first described by Merlot et al., 142 and was later discussed in detail by Wirz et al. 143 to which we refer for the full analysis.In short, Wirz et al. showed that with robust local fitting, Equation (28), the Coulomb term is unbounded from below but the exchange term remains bounded.Conversely, with non-robust local fitting, Equation (26), the Coulomb term is bounded from below, while the exchange term is unbounded.Using either robust or non-robust fitting for both terms may thus lead to variational collapse.On the other hand, as first noted by Manzer et al., 144 the problem of variational collapse can be circumvented rather easily by using robust fitting for the exchange term only.[147]

| CONCLUSION
We have reviewed the use of the Cholesky decomposition (CD) in electronic structure theory, focusing on the CD of the electron repulsion integrals that was first discussed by Beebe and Linderberg. 1 Our review was divided into two parts.The first part (Section 2) presented the general CD algorithm, and then proved the connection to Gram-Schmidt orthogonalization that can be used to unite the CD and RI/DF approaches as well as to formulate the efficient two-step CD approach in Section 2.1.We reviewed the development of CD approaches in the literature in Section 2.2: the CD technique enables computationally efficient approaches with reduced memory demands for a large variety of ab initio methods, and we presented several cases in which this technology offers reductions in computational time by at least an order of magnitude.The second part of this work (Section 3) discussed the RI/DF approach, and continued with the review of auxiliary basis set approaches in Section 3.1.The tight relationship between the CD procedure and RI/DF technology has been employed in the literature for automatically generating auxiliary basis sets for RI/DF methods with the CD technique, offering a robust approach that can easily be applied on the fly to any orbital basis set.Indeed, atomic CD procedures are the only way to derive compact auxiliary basis sets with strict error control.It is our view that the developments of the last 20 years demonstrate empirically that auxiliary basis sets which are consistently accurate, robust and of general purpose are best derived in an atomic Cholesky procedure.Moreover, the tight relationship between CD and RI/DF is nowadays obvious also in the literature, as CD and RI/DF are handled in many programs within a single implementation of these two efficient inner projection methods.
It is an interesting coincidence that the previous review 6 was published in 2011 close to 100 years after the invention of the CD by A.-L. Cholesky in 1910.Similarly, this work also marks almost 100 years since the original publication of the CD method 2 in 1924.
We conclude this review with another interesting fact.The 2011 review by Aquilante et al. 6 concluded by noting that the original work of Beebe and Linderberg 1 did not gain many citations in the first 30 years after its publication, and that most of the citations it had accrued by the end of 2010 were from articles published in the preceding 3 years (2007-2010); we note here that the article of Beebe and Linderberg had been cited a total of 97 times by the end of 2010.Aquilante et al. opined that the CD procedure had much more to offer for quantum chemistry in the future.As documented in this article, the CD procedure has been central to the development of efficient and robust computational strategies by many research groups.Today, in September 2023, the paper by Beebe and Linderberg has been cited a total of 398 times according to the Web of Science.We expect this strong growth of the use of CD techniques in quantum chemistry to continue in the upcoming decades.

F I G U R E 2
Retinal bound to rhodopsin with active retinal was used by Folkestad et al. to exemplify the capacity of their CD implementation in e T .In this calculation, the one-center version of the CD of the ERIs was used in association with an aug-cc-pVTZ basis set, with N AO ¼ 79,420 AO basis functions.(Reprinted from Reference [67] with permission from AIP Publishing.)

F
I G U R E 3 A representation from the benzene molecule of the errors in reproducing the ERIs.The maximum (left) and rootmean-square (RMS, right) errors of the AO ERIs are presented employing the RI-J, RI-C, and two aCD auxiliary basis sets for the SVP orbital basis set.For the two aCD basis sets the thresholds of 10 À2 E h and 10 À3 E h were used.(Adapted from Reference [115].)F I G U R E 4 Illustration of the sparsity in the primitive Gaussian basis associated to an acCD basis set, compared with that arising from the original orbital basis and the aCD basis sets, exemplified by the s shell of the ANO-RCC basis for the Tc atom.Primitive exponents in (a) the orbital basis (b) the resulting aCD basis (τ ¼ 10 À4 E h ) and (c) the resulting acCD (τ ¼ 10 À4 E h ) basis set.Note the use of a logarithmic x axis.(Reprinted from Reference [116], with the permission of AIP Publishing.)