Comparison of many-particle representations for selected-CI I: A tree based approach

The full configuration interaction (FCI) method is only applicable to small molecules with few electrons in moderate size basis sets. One of the main alternatives to obtain approximate FCI energies for bigger molecules and larger basis sets is selected CI. However, due to: (a) the lack of a well-defined structure in a selected CI Hamiltonian, (b) the potentially large number of electrons together with c) potentially large orbital spaces, a computationally and memory efficient algorithm is difficult to construct. In the present series of papers, we describe our attempts to address these issues by exploring tree-based approaches. At the same time, we devote special attention to the issue of obtaining eigenfunctions of the total spin squared operator since this is of particular importance in tackling magnetic properties of complex open shell systems. Dedicated algorithms are designed to tackle the CI problem in terms of determinant, configuration (CFG) and configuration state function many-particle bases by effective use of the tree representation. In this paper we describe the underlying logic of our algorithm design and discuss the advantages and disadvan-tages of the different many particle bases. We demonstrate by the use of small exam-ples how the use of the tree simplifies many key algorithms required for the design of an efficient selected CI program. Our selected CI algorithm, called the iterative configuration expansion, is presented in the penultimate part. Finally, we discuss the limitations and scaling characteristics of the present approach.


| INTRODUCTION
The exact full configuration interaction (FCI) approach to the solution of the Schrödinger equation of molecules in a given finite basis set has many attractive characteristics. One of the most important is that it represents a systematic and well established algorithm for the solution of the molecular Schrödinger equation. [1][2][3][4] However, due to the factorial growth of the Hilbert space with the basis set dimension, an exact FCI approach is feasible only for small molecules such as NH 3 , N 2 and C 2 H 2 . [5][6][7][8] Motivated by this observation, attempts were made as early as the 1960's to search for approximate FCI approaches. Initial investigations in this direction led to the realization by a number of researchers in the 1960s that a judicial choice of the configurations in the Hilbert space could produce a reasonably accurate approximation to the FCI wavefunction. [9][10][11] First attempts at a systematic approach toward approximate FCI was proposed by Bender and Davidson 12 followed by Whitten and Hackmeyer 13 with the idea of limiting the CI to the most important singly-and doubly-excited configurations from a rationally chosen 0th order configuration. However, the first multi-reference selected-CI algorithm going beyond singlesand doubles-CI was that of Huron et al. 14 who proposed in the early 70's the configuration interaction by perturbation with multiconfigurational zeroth-order wavefunction Selected by Iterative processes (CIPSI) approach. This was the first systematic multi-reference (MR) selected-CI approach where configurations were automatically selected via a perturbative criterion. A closely related approach was independently proposed about the same time by Buenker and Peyerimhoff in 1974 where an individual selection of configurations based on perturbation theory was devised to systematically extrapolate to the FCI energy. 15,16 Despite many favorable results, the approach did not find large-scale use in the broader community of computational chemists, mainly due to the lack of efficient computers in the 70's and 80's. In later years, size-consistent approaches based on many body perturbation (MBPT) or coupled cluster (CC) theory were vigorously pursued and selecting CI methods became a niche application. [17][18][19] In recent years there has been a resurgence of the selected-CI approaches along the lines of the original CIPSI proposal. Significant variations in the core idea as well as in the algorithm have been proposed, partially for algorithmic reasons and partially in order to adapt the approach to the powerful computational hardware of current times. The new variations to the approximate FCI problem can be separated into three broad categories as follows: 1. Stochastic and semi-stochastic approaches such as the full-CI quantum Monte Carlo (FCIQMC) of Alavi et al. [20][21][22] The FCIQMC method has also been recently formulated using a spin-adapted basis based on the GUGA. 23 The heat-bath CI approach, which is a semi-stochastic determinant based approximate FCI method [24][25][26] and is closely related to the selected-CI approach of the Scemama et al. [27][28][29] The FCIQMC has also been combined with the coupledcluster method by Piecuch et al. 30,31 In this hybrid approach a stochastic estimation of the Coupled-Cluster (CC) amplitudes via an FCIQMC run is performed in order to use these approximate amplitudes to perform a deterministic coupled-cluster (FCIQMC + CC) calculation.
2. Purely variational approaches such as the iterative CI 32 33 3. Size-consistent approximate FCI approaches such as full coupledcluster reduction (FCCR) of Xu et al. 34 and the many body expansion FCI (MBE-FCI) of Eriksen, Gauß. [35][36][37] The CC based exponential parametrization of the wavefunction in the FCCR method ensures its size-consistence, however, it is no longer variational. The MBE-FCI on the other hand also uses a non-linear parametrization where the approximation involves a truncation of the orbital space of excitations for the cluster operators. It has the advantage of being applicable in weak and strongly correlated regimes, however it is also not variational.
A different philosophy which follows the wavefunction renormalization group (RG) idea but uses the RG flow applied to the Hamiltonian instead of the wavefunction is the driven similarity renormalization group (DSRG) method of Evangelista et al. [38][39][40][41] Such an approach has also been used in the multi-reference setting. Finally, we would like to point out collaborative efforts for the comparison of many-body methods by the Simons collaboration 42 and also the recent blind challenge to access the various selected CI implementations by Eriksen et al. 43 As evidenced above, an extremely rich set of methods and ideas have recently emerged for the solution of the approximation FCI problem. One of the reasons for this variety is the absence of a wellestablished general and optimal algorithm for the selected CI method in contrast to the FCI method. 1,44,45 One of the key differences between selected CI and FCI is that in selected CI a set of configurations is imposed by the chemistry of the molecule and hence precludes any a priori knowledge of structure in the Hamiltonian. It is this absence of structure in selected CI that renders it challenging to device an optimal algorithm giving rise to a rich terrain of possibilities.
Following this spirit, we present an Ansatz and algorithm which is based on the idea of representing many particle basis functions in terms of a "configuration tree". In our nomenclature, we refer to the distribution of a given number of electrons among the available spatial orbitals (irrespective of spin) as a "configuration" (CFG). A CFG basically consists of a set of occupation numbers with values 0, 1 or 2 for the available orbitals of the system. Given a desired total spin S, a given CFG always contributes all linearly independent configuration state functions (CSFs) that are consistent with the given total spin. A single CSF, on the other hand, represents just one specific spin coupling among the singly occupied orbitals that leads to the desired total S. CSFs can be, but do not have to, represented by a linear combination of slater determinants (DETs) in which each singly occupied orbital is occupied either by a spin-up or a spin-down electron. As will be discussed in detail in the main body of the text, each of these three types of many-particle objects can be represented by a tree structure.
We would like to remark that this paper is the first in a series of three manuscripts in which we describe, compare and demonstrate the tree based selected CI approach for DETs, CFGs and CSFs. This first paper is organized as follows; First, we describe how to build the tree, that is, how the three types of many-particle basis function (MPBFs) are represented in the tree. Second; We demonstrate how to traverse the tree in order to search, add or expand the tree with new MPBFs, for example, to generate the first order interacting space (FOIS) relative to a given set of generating MPBFs. Second, we describe the algorithm for the calculation of the coupling-coefficients (CCs) between pairs of DETs, CFGs, and CSFs and how the tree representation greatly facilitates this task. CCs are the matrix elements between pairs of MPBFs and a string of excitation operators that are required to evaluate Hamiltonian matrix elements as shown in Equation (1).
Here, jΨi is a many particle wavefunction, h pq = pjĥjq D E and g pq, rs = (pq| rs) are one and two-electron matrix elements over the usual non-relativistic many particle Hamiltonian (note that two-electron integrals are denoted in chemists's notation [11|22] where "1" and "2" refer to the coordinates of electron 1 and 2 respectively). The labels p, q, r, s refer to a set of orthonormal one-particle functions and E pq = a + pβ a qβ + a + pα a qα are the spin-traced orbital replacement operators (generators of the unitary group) with a + pσ , a qσ (σ = α, β) being standard Fermion creation and annihilation operators.
Third, we present our algorithm for tree based approximate FCI calculations that we call the iterative configuration expansion approach (ICE). A CFG based version of the latter has been publicly available in a preliminary form since 2015 to the users of the ORCA program package [46][47][48] where it has already been used to good effect in a number of studies. [49][50][51][52] Our present work greatly extends the flexibility, efficiency and applicability of this algorithm.

| TREE REPRESENTATION OF MANY-PARTICLE BASIS FUNCTIONS
At the heart of the present work, is the organization of the MPBFs in terms of a tree data structure. This design facilitates the construction of an efficient CI algorithm in multiple ways: 1. Addressing of MPBFs in a lexicographical fashion. 5. Calculation of coupling CCs in terms of partial CCs that contribute to multiple final CCs. As discussed below, this is similar to the graphical unitary group (GUGA) algorithm, although we do not make recourse to advanced group theoretical techniques.
The advantage of using the tree data structure is particularly favorable for operations of type 2 and 4 since the tree provides an optimal way of searching a list of MPBFs. As to points 4 and 5, we will show that a given spin-coupling scheme can mapped onto the vertical dimension (i.e., height) of the tree in a rather natural way, which greatly facilitates the recursive evaluation of the coupling-coefficients (point 5) as will be described in Section 5.
The general design of the configuration tree is analogous for the three MPBFs. The tree has a root-node from which "branches" can evolve. In our construction, each of the available orbitals constitute a vertical level of the tree. Each orbital constitutes a "node." From each node structure there can either emerge 2 (DETs), 3 (CFGs) or 4 (CSFs) branches. In a nutshell, the different nodes refer to the possible orbital occupation numbers and spin couplings in the different MPBF constructions. The details will be discussed below. To each branch, there is another node connected. The tree terminates at a "leaf." The leaf is another node structure that contains the final node, after the required total number of electrons has been reached. For example, for a given number of electrons N and a total spin S, this can happen the earliest at level N/2 of the tree (as for the "Aufbau" closed shell determinant if the orbitals are ordered according to orbital energy). This structure optimizes the memory footprint of the tree since any given node will, in general, contribute to multiple MPBFs. The general construction of the tree is shown in Figure 1 which shows a tree consisting of three electrons and three orbitals and spin S = 1/2.

| Determinant tree
DETs are the simplest MPBFs. Since DETs are constructed with α and β spin-orbitals, they can be stored in separate αand β-trees respectively (split-determinant representation 53 ). Therefore, the αand β-trees contains N α and N β electrons respectively. The MPBF is then written as a product of the α and β DETs and the wavefunction is a linear combination of these functions as given in Equations (2) and (3) below.
jΨ i = X Here, i and i represent the α and β spin-orbitals respectively and C I are the coefficients for the I th DET Note that the DET jΦ I i only has a well-defined M s = (N α − N β ) quantum number but is not an eigenfunction of the total spin squared operator (Ŝ 2 ) and consequently does not have a well-defined spin-quantum number S.
Thus, the total wavefunction expressed in terms of DETs jΨi also does not necessarily have a well-defined spin S, unless special care is taken to ensure that jΨi is an eigenfunction ofŜ 2 , that is, The α tree consists of all selected α DETs and is a binary (e.g., "two-level") tree since the α spin-orbitals can have an occupation of either 0 or 1 electron. Similarly, the β tree is made up of all selected β DETs. In the case of the DET representation of the MPBFs, an additional data structure is required for the book-keeping of which specific products of α and β Slater-determinants ( Φ α I Φ β I i ) are actually present in the selected space. Further discussion of this aspect will be presented in the following sections. A schematic of the split-DET tree with the constituent α and β trees for the simple case of (5e,3o) with spin S = 1/2 is shown in Figure 2 below.

| Configuration tree
Next we turn to the CFG MPBFs. As noticed in the introduction, organizing the MPBFs in terms of CFGs requires that for a given CFG, all CSFs with the desired total spin S are kept. Their number is given by the Weyl formula: where N is the number of unpaired electrons in the given CFG. Thus, the spin-indices are implicitly accounted for in this representation and the CFG basis only requires the storage of occupation number for each orbital. All relevant spin-dependent logic is performed separately using a prototype scheme once the maximum number of singly occupied orbitals (SOMOs) and the total spin S are known. Since the occupation of a single orbital can only be one of 0, 1, or 2, the CFG tree is a trinary ("three-level") tree with at most three branches for a given node as shown for a prototypical case of (3e,3o) in Figure 3.
The wavefunction is then written in terms of the CSFs Φ S I,k of the given configurations Φ S I (M S = S is implied throughout). In our particular implementation, the CSFs are built from bonded-functions 54 F I G U R E 1 A schematic representation of the tree data structure with a description of the nodes which are its basic ingredients. This particular tree for example shows all the configuration (CFGs) for a S = 1/2 (3e,3o) Hilbert space The split-determinant tree with the constituent αand β trees for a simple case of (5e,3o) and spin S = 1/2. The Hilbert space is made up of three α electrons and two β electrons occupying three orbitals F I G U R E 3 The configuration tree for the case of (3e,3o) and S = 1/2. Only the occupations at each level are kept with the spindependent part decoupled which are themselves made of DETs in terms of spin-orbitals as shown in Equation (5) through (8) below.
The  spin is low (e.g., S~0). We emphasize that the CFG representation is by no means bound to the use of bonded functions as underlying CSF basis. Any convenient representation will serve the same purpose.

| CSFs tree
The CSFs are the most complicated MPBFs considered in this work.
They take into account both the occupation and the spin-coupling information of each electron with all other unpaired electrons. In the present formulation, the CSFs are constructed via the genealogical coupling scheme of Grabenstetter et al. 55 where a given CSF is built by adding electrons one by one sequentially. In this scheme, the spincoupling information for an electron at SOMO k is specified with respect to all other SOMOs i < k. The occupation/spin-coupling for each SOMO k is specified by ξ k which can be 2, ±1, or 0 depending on whether the k th electron is a DOMO (ξ = 2), SOMO with parallel (ξ = + 1) or anti-parallel (ξ = − 1) coupling with all the SOMOs before it, or an unoccupied orbital (ξ = 0) respectively. In this manner, all the CSFs can be specified with the pattern of occupation along with the sign ± for the SOMOs as given in Equation (9) below.
Consequently, a given node in the CSF tree can have up to four branches ("four level" tree) depending on the combined occupation/ spin coupling index being 2, 0, + 1 or −1 ( Figure 4). Note that since negative spins are not allowed, the first MO has only three possible occupations 0, 1 and 2 as a −1 at level 1 will imply a negative coupling with vacuum which is meaningless.
The CSF j Φ S I has a somewhat complicated form in the DET basis in terms of the Clebsch-Gordan coefficients (CGCs) which can be derived following the genealogical coupling scheme. 55 The wavefunction in terms of individual CSFs j Φ S I is given in Equation (10) below.
where each CSF j Φ S I can be written in the DET basis as shown by The factors o Mi + 1 M i ,m i represent the CGCs defined below in Equation (12). Here, the CGCs are calculated using the three angular momenta represented by the intermediate spins at MOs i, i + 1 given F I G U R E 4 The configuration state function (CSF) tree for the case of (3e,3o) and S = 1/2 which is represented as a quaternary tree where the possible branches represent empty orbitals (0), doubly occupied orbitals (2) and single occupied orbitals that are coupled parallel (1) or antiparallel (−1) to the intermediate total spin. Note that since any molecular orbital cannot have a −ve spin value, the −1 node at level 1 is disallowed by S i , S i + 1 and the electron spin-angular momentum ½ as well as their S z components M i , M i + 1 and m i .
The anti-symmetrized (℘) sum of products of Slater-determinants Π i j i, m i i are normalized by the factor 1 N! , where i and m i represents the MO and the S z component of the electron spin (±1/2) at that orbital respectively. D is the phase factor related to the paired states defined in Equation (13), where δ(i, j) represents the Dirac delta function.
A detailed description of the notation will be presented in Sec- CCs are calculated directly using only the information of the sequence of couplings given by ξ i 's for hΦ S J j and j Φ S I i. Therefore, this representation is particularly useful for the case of molecules containing a large number of SOMOs with a low total spin.
The details of the representation and a graphical notation for the CGCs and their derivation will be discussed in detail in a forthcoming publication.

| Branching diagram representation
The CSF representation as presented in Equation (11) and Figure 5 may be considered to be somewhat convoluted. A more elegant and illuminating representation is afforded via the well-known branching diagram representation for the CSFs. 54 The branching diagram is a visual representation of the spin-coupling involving the SOMO part of the CSF, that is, the MOs with ξ = ± 1. As an example, consider the two CSFs for the simple (3e,3o) case with S = 1/2 and three SOMOs shown in Figure 4 above and given in terms of determinants in Table 1. The S = 1/2 spin subspace contains two CSFs with three SOMOs whereas the S = 3/2 subspace consists of the sole CSF given by [1,1,1]. The two CSFs for the S = 1/2 and one CSF with S = 3/2 can be simply visualized by the branching diagrams shown in Figure 6.
Note that both the CSFs of the S = 1/2 spin subspace belong to the same CFG [1,1,1], therefore demonstrating that a single CFG can contain multiple CSFs differing in their spin-coupling. The branching diagrams shown in the Figure 6A

| Comparison with alternative representations
A similar but slightly more involved graphical representation has been used by the pioneers of the graphical unitary group approach. The tree representation described above and shown in Figure 4 can be alternatively represented using the Shavitt graph 56 as shown in The graph is constructed using the Paldus array representation [57][58][59][60]73 for the CSFs which uses an array of vectors p i 's which represent the occupation pattern at the ith orbital. Each occupation pattern vector at the ith orbital (p i ) keeps track of the number of electrons N i and spin S i up to that orbital. This is achieved using three variables and c i keeps track of the number of 0 0 s thus satisfying the rule a i + b i + c i = i. For a given F I G U R E 5 A graphical representation of the configuration state function (CSF) given algebraically in the Equation (11). See the text for a detailed description of the notations T A B L E 1 The CSFs and their notation in slater-determinant basis for the simple case of (3e,3o) Note: Only the CSFs belonging to the CFG [1, 1, 1] are shown for simplicity. Abbreviation: CSF, configuration state function.
fixed number of MOs, each row vector p i can therefore be uniquely represented by the pair (a i , b i ) and once a CSF is given, the Paldus array can be constructed using the pairs (a i , b i ) as shown in Table 2 for the [2,1,0] CSF as an example. The Shavitt graph is constructed using the Paldus array for all the CSFs such that each path represents the series of p i 's for the CSF belonging to that path.
There is an indirect connection between the Shavitt graph and the branching diagram representation in that the horizontal axis of the Abbreviation: CSF, configuration state function.

Algorithm 1. FindString
The search for a given many-particle basis function (MPBF) in the tree in the case of a two-level tree. We search for the address of a given externally provided string contained in variable STR that is a vector of occupation numbers that are either 0 or 1. P is a pointer to a node structure that by itself contains two pointers O0 and O1 that point to the next level in the tree in case of occupation numbers 0 (O0) or 1 (O1) respectively. P also contains a field "addr" that either has the value "−1" or the actual address of a given string if the node is an actual leaf. If the electron counter iel has reached the total number of electrons in the system, the present node must contain an address which is returned in line 6. Otherwise, the algorithm continues to traverse the tree.
The above algorithm can be trivially adapted to the CFG and CSF case by taking care of all possible occupation numbers (and spin couplings in the case of CSFs). Note that this is independent of the type of MPBF since the depth of the tree is always at most N MO irrespective of the type of MPBF. A similar algorithm can be used to obtain a MPBF (i.e., in occupation number representation) from its address by storing the pointer to the leaf nodes corresponding to the address of each MPBF in a list. For a given address the MPBF can be obtained, using the pointer to the corresponding leaf node, followed by traversing the tree in the opposite direction, that is, from the leaf to the root node.

| Discussion of the tree approach for determinants
A subtle point concerns the sparse representation of the Determinant.
Note that due to the split-determinant representation, the α and β trees alone are not sufficient to represent a determinant which is a product of an alpha and a beta determinant (α I , β J ). An additional data structure is required for the bookkeeping of the allowed pairs while preserving the sparsity of the representation. In order to exploit this sparsity, we have adopted a flexible matrix data structure (hereafter called the FlexMatrix) which is designed to avoid storage of O (N α × N β ) size tables. This is achieved by keeping two tables, one for each α-strings and one for the β-strings. Each

| FOIS (EXPANDING THE TREE)
One of the crucial operations during any configuration interaction procedure is the generation of the FOIS starting from a given set of initial HamiltonianĤ. Once a tree with a list of |Φ 0 i's has been initialized as explained in the previous section, one can perform the single and double-replacement operationsÊ pq andÊ pqÊrs on each MPBF in jΨ 0 i in order to generate the FOIS. Since a given singly-or doubly-excited This implies checking the presence of a newly generated j Φ q p E or j Φ qs pr E before assigning it an address. This operation is most efficiently and naturally carried out in our tree data structure as explained below.

| Determinants
The determinant MPBF is conceptually the simplest of all three in the sense of the ease of calculating the matrix-elements forÊ pq andÊ pqÊrs . The application ofÊ pq with a given p and q on a determinant | Φ I i can generate at most two other determinants jΦ J1 i, jΦ J2 i f gas described in Equations (14)- (16): where the application ofÊ pq on | Φ I i has been broken down into an α with o pq α and o pq β containing the coupling-coefficients for the α and β excitations respectively. This implies that the α and β determinants belonging to the FOIS can be generated independently by applying the single-and double-excitation operators on the alpha and beta trees separately. The growth of the tree by the successive application of the single-and double-excitation operatorsÊ pq andÊ pqÊrs is shown in Figure 8.
One can see that of all the products Φ q constitute only a small subset. Therefore, after the generation of the α and β determinants belonging to the FOIS, one still has to consider those products such as Φ J1,α Φ q Both α and β DETs have their own separate FlexMatrix data structure.
The FlexMatrix data structure for the α DETs has N α rows with each row containing the address of the connected β DETs as described in The generation of the first order interacting space (FOIS) for the α determinant | Φ I i (whose path is indicated in brown) is depicted in terms of the growth of the tree containing the newly generated FOIS α determinants. The blue paths represent the singly-excited α determinants j Φ α J and the purple path indicates the sole doubly excited α determinant j Φ α k F I G U R E 9 The first order interacting space (FOIS) for the configuration (CFG) many-particle basis function (MPBF) with a small Hilbert space with (4e,4o) and S = 0. The I th CFG jΦ S I indicated in brown is [0,0,2,2] and the singly-and doubly-excited CFGs are indicated in blue and purple respectively Section 3.1. This data structure is unique to the DET representation and is a direct result of the loss of inherent structure in the FOIS as opposed to FCI. Finally, we would like to point out that the matrix- is a vector of dimension at most 2 for a give p and q. As we shall see below, this simple operation becomes somewhat involved for the CFG and CSFs.

| Configurations
As described in Section 2.2, a given spatial configuration j Φ S I is composed of CSFs j Φ S I,k enumerated by the irreducible representation labels (k). Therefore, the logic required for the application of theÊ pq operator becomes much simpler in the sense that one configuration Thus, the following relations hold: and jΦ S With these constraints, one can already filter most of the CSFs before their generation. However, this simple rule is not sufficient to ensure the exact number of FOIS functions. In order to further iden- which have a non-zero coupling with the parent CSF jΦ S I , the following relation known as the triangle relation must be satisfied: Note that the above identity relates the k th coupling ξ I k in the bra CSF Φ S

| CALCULATION OF COUPLING COEFFICIENTS (RECURSIVE TREE WALK)
Once the tree has been filled with a set of variational MPBFs and the FOIS has been generated, the next step is the evaluation of the for each interacting I, J pair of MPBFs. In order to efficiently carry out this step while avoiding a search operation that scales quadratically in the number of selected MPBFs, two operations must be carefully implemented: 1. An efficient algorithm for the identification of those set of J MPBFs for a given I for which the matrix-element Φ J jĤjΦ I

D E
is not trivially zero because they differ by more than one-or two-orbital occupation mismatches. Thus, one must be able to efficiently generate and access the singly-and doubly-excited Φ J MPBFs for a given Φ I MPBFs.

| Determinants
The calculation of coupling-coefficients in the determinant basis is the simplest and consequently can be heavily optimized. As and find all paths which differ from this path at a maximum of two orbitals. This operation is at where N singles represents the number of singlyexcited determinants. An example of such a path for a (4e,4o) case with S = 0 is shown in Figure 11 above. The path represented in the α Tree in Figure 11 shown in brown is the parent j Φ α  (26):

| Configurations
A given spatial configuration j Φ S I is composed of CSFs j Φ S I,k enumerated by the irreducible representation labels (k) as described in  Note that the total number of bonded functions jΘ S c is the same as the total number of CSFs for a given number of SOMOs.
In order to identify the singly-excited configurations jΦ S J , similar to the case of determinants, we do not need to traverse the full configuration tree. To generate all the singly-excited configurations with respect to a given configuration Φ S I j shown as the brown path in Figure 12 above one only needs to traverse those paths which differ from the parent Φ S I j by at most two differences. This is illustrated for a simple case of (4e,4o) in Figure 12B above with the parent Φ S I j generating a singly-excited configuration jΦ S J for a given p and q orbital.
Contrary to the determinant representation, in the case of configurations there can be only one singly-excited configuration jΦ S J for a given p and q as shown in the Figure 12B by the blue path. The

| Configuration state functions
The main important difference for CSFs ME evaluation as compared to DET and CFG MPBF is as follows; In the CSF basis, the application of the The difference tree in the case of CSFs is of fundamental importance for an on-the-fly evaluation of coupling-coefficients. The beauty and efficiency of the angular momentum coupling approach for the calculation of coupling-coefficients is clearly seen in our tree representation. As an example, consider the case of (4e,4o) shown in Figure 13 above, the par- of these coupling-coefficients will be described in the following.

| Explicit equations for coupling coefficients
The derivation of the coupling coefficients C pq IJ which are required for the evaluation of the application ofÊ pq on jΦ S I (Ê pq jΦ S begins by the graphical representation of the CSF shown in Figure 5.
Using the graphical representation of the CSF and the difference tree shown in Figure 13 above, one can draw a graphical representation of the coupling-coefficient as shown below. Here The final expression for the calculation of the coupling-coefficient o pq IJ can then be written in terms of the above three factors as given below (for simplicity we assume q > p + 1): We note that Equation (18) given in paper I contains a typo where the phase factor (−1) p − q has been left out. The correct equation with the phase factor (−1) p − q is given in our Equation (32). A detailed derivation of the above formula for all cases including (p = q + 1) and the formulae for double-excitation operators (ê pqrs ) will be given in a forthcoming publication. It is important to emphasize here that the calculation procedure derived above does not require the determinant representation or the branching diagram representation of CSFs at any point.
There are many appealing features of the above formulation for an on-the-fly calculation of the coupling-coefficients which are as follows: , and jΦ S Jb E are the same, they differ in their spin-coupling F I G U R E 1 4 A schematic representation of the coupling coefficient o pq IJ in terms of the angular momentum coupling graph and the related factors A q , T i and B p 3. The advantage of a recursive formulation is that once any intermediate factor T i is zero (due to angular momentum conservation for e.g.), all subsequent paths are terminated and the recursion terminates. Consequently, one is automatically only traversing those paths which result in non-zero coupling-coefficients Φ S I jÊ 1,3 jΦ S Ja D E and Φ S I jÊ 1,3 jΦ S Jb D E .
Next, we shall describe how the above operations are combined to build a tree based ICE algorithm.

| Efficiency considerations
Motivated by the comments of a referee to this paper, we wish to briefly comment on the efficiency of the tree algorithm compared to algorithms based on hash tables. All considerations in this section apply equally to CSFs, DETs and CFGs. The numerical performance data in Table 3 compares a hash table-based algorithm that we implemented in response to the comments by a referee to the CSF-tree based algorithm. It is evident that on average for a CAS (12,12) space, the Tree algorithm is about 40% faster than the hash table algorithm and for CAS (14,14) the speedup up is nearly 60%.
The Min. and Max. data in Table 3  Thus, the name "ICE" with a tree based representation of MPBFs.
In the following sections, we shall describe each step of the ICE algorithm in detail, emphasizing at each point the most notable differences between the determinant, configuration, and CSF MPBFs.

| Initial input wavefunction
The beauty of CIPSI is its ability to automatically find the electronic ground state jΨi from an arbitrary initial wavefunction jΨ (0) i provided that the overlap hΨ| Ψ (0) i is non-zero (e.g., the initial and desired state have the same symmetry and multiplicity). Nevertheless, convergence can be greatly improved if the initial wavefunction jΨ (0) i is chosen with chemical intuition of the problem at hand.
In our implementation, there are three ways to choose the initial guess which are as follows: 1. An Aufbau configuration respecting the total spin can be requested. In all the above cases, we have an initial set of MPBFs according to Equation (33).

A complete active space (CAS) CI wavefunction j Ψ
Where we can ensure that the overlap in each case with the ground state full CI wavefunction hΨ| Ψ (0) i is non-zero through a judicial T A B L E 3 Comparison of timings for finding all the singly-excited CSFs using the tree representation and hash choice of the reference MPBFs jΦ 0 The first two types of guess wavefunctions can be automatically generated whereas the third type of guess is input by the user. Note that using the options 2 and 3, a guess may result that is orthogonal to the ground state, that is, hΨ| Ψ (0) i = 0 but is non-orthogonal to an excited state hΨ 1 | Ψ (0) i > 0. Such a guess can be used to converge the ICE wavefunction to a chosen excited state (e.g., of a different spatial or spin symmetry.) A final crucial point concerns the DET-ICE variant, where the input of the initial guess is performed in terms of CFGs. Since the initial guess is supposed to have a large weight in the final wavefunction, all the determinants belonging to the input guess configuration are generated. This ensures that the input guess can be given in a simple fashion on one hand and on the other hand also ensures that the initial wavefunction is a spin-eigenfunction since we satisfy S 2 |Ψ 0 i = S(S + 1)|Ψ 0 i by generating all DETs belonging to the input configuration.
Given an initial guess wavefunction jΨ (0) i, we can then proceed with the subsequent steps of the algorithm.

| Generation of new MPBFs
Once an approximate wavefunction at the k th iteration jΨ (k) i has been obtained, the next step is to expand the tree with the most important configurations from the FOIS with respect to jΨ (k) i. The basic principles for the generation and representation of the FOIS for each of the three types of MPBFs has been described previously in Section 3 and here we further analyze some algorithmic aspects of this part. The generation of new MPBFs for the 0 th iteration differs considerably from succeeding ones and therefore we separate these into two parts.

| Initial iteration
Given an initial set of 0 th order MPBFs Ψ 0 , the FOIS is generated from the full list of initial MPBFs. The first important thing to consider is the issue of correct addressing of the newly generated FOIS configurations. Since a given new singly or doubly-excited MPBFs j Φ 1 ð Þ J has previously been generated. This is a performance critical part of the operation and in our tree-based representation this operation is optimally carried out since the time complexity of searching for an element in a tree is given by O N MO ð Þas described in Section 3.

| Subsequent iterations
Next, we describe how the k + 1 th set of MPBFs are generated from a given approximate wavefunction |Ψ (k) i at the k th iteration. In the original CIPSI formulation, at each step k all the determinants at that step were used to build the k + 1 th order wavefunction. In this respect the "generator" determinants consisted of all the determinants in the variational space at iteration k.
In the present ICE approach, for the second iteration onwards, we follow the three-step CIPSI philosophy of Evangelisti et al. 69 where only a small subset of the k th configurations are chosen as "generators" and allowed to generated those candidate configurations that would be included at the k + 1 th iteration. The "generator" configurations which are allowed to bring in their progeny are chosen according to the following rule: Here we introduce the first threshold, T gen , which is used to distinguish the generators among all the configurations at the k th iteration. All those configurations j Φ Next, we shall proceed to the selection of the new MPBFs from the generated perturbers.

| Selection of important MPBFs
Given a set of perturber MPBFs jΦ E 's are found using the same idea as the original CIPSI, that is, via second-order Epstein-Nesbet perturbation theory (PT2). 70,71 The expression for the PT2 energies is different for the initial step and for the subsequent iterations as we shall describe below.

| Initial iteration
In the initial step with user defined configurations, we perform an "un-contracted selection," since the coefficients of the guess as of yet unknown.
During the un-contracted selection, first the PT2 energies of each Φ 1 ð Þ J are calculated according to Equation (38).
where Δ −1 JI is the energy denominator given by: Thus, the selection process involves the interaction of each FOIS function with each of the generator MPBFs. Following Equation (

| Subsequent iterations
From the second iteration onwards, where we have an estimate of the k th order wavefunction |Ψ (k) i, we perform a "contracted" selection.
By "contracted" we refer to the generator part of the variational wavefunction with renormalized coefficients obtained from the variational procedure. The renormalized wavefunction is prepared by diagonalizing the Hamiltonian in the subspace of the generator configurations as follows: Using the renormalized wavefunctionΨ where the state-specific denominator Δ − 1 JN is given by: Such that the state-specific PT2 contribution E n ð Þ PT2, k + 1 ð Þ J is evaluated for each root n. Note that here we follow a state-averaged selection, that is, once the PT2 energies for each state n are calculated, we sum over to get the final PT2 contribution for each perturber as follows: can be a useful indicator of the convergence of the wavefunction. If this quantity is sufficiently small, it can also be added to the total energy in order to improve the total energy estimate.
Finally, the last step of the procedure is to diagonalize the variational space at k + 1 th step and form the new wavefunctions |Ψ (k + 1) i, this procedure is described in the next section.

| Diagonalization of the variational space
The final part of each iteration is the formation of the k + 1 th approximation jΨ (k + 1) i to the exact wavefunction. In order to obtain jΨ (k + 1) i, we need to diagonalize the Hamiltonian in the basis of the chosen MPBFs at the k + 1 th iteration given by Equation (46): The lowest energy eigenvector of the HamiltonianĤ i. Similarly, we can obtain the k + 1 th approximation to the lowest n-roots via the diagonalization ofĤ k + 1 ð Þ var . The diagonalization is carried out via the modified Davidson algorithm commonly used in the configuration interaction method. The main task of the algorithm involves the calculation of the Sigma-vector σ n,i at teach step of the iteration i, the equations for the Sigma-vector are given below in Equation (48): where σ n,i is the Sigma-vector for the n th root at the i th iteration of the WithÊ pq representing the usual one-electron operator defined aŝ E pq = P γĉ † p,γ Áĉ q,γ with γ ∈ α, β spin indices. The one-electron part of the Hamiltonian defined in Equation (49)  rithms is the use of the following Identity given in Equation (51): where the functions jKi form a complete linearly independent basis for the operatorÊ pq as derived in Equations (52) and (53): where we have used the identities P j i. Therefore, the functions jKi form a linearly independent set of basis functions such that the following identity is satisfied:Ê Implying that: This is known as the resolution of identity (RI) and exchanges the direct calculation of the two-electron part of the operation This expression is useful for cases where the direct calculation of rs Φ n,i j i is either too time consuming or memory consuming. In such cases, we adopt the RI expression show in Equation (56) instead. It is important to note that the dimension of the RI space jKi does not need to be that of FCI space it only needs to satisfy Equation (56), that is, be closed under the application ofÊ pq .
In the following sections, we shall describe the details of the Sigma-vector operationĤ k + 1 ð Þ var Φ n,i j i for each of the three MPBF to highlight the differences in each case.

| Determinants
The DET MPBF is notoriously the most complicated of the three due to the absence of any structure in the form of the wavefunction as indicated in the Introduction. The wavefunction in split-DET basis is made up of products of α and β DETs j Given a list of α and β determinants j Φ α I i , j Φ β I i , one has to keep track of allowed pairs Φ α I j Φ β I i in an auxiliary data-structure such as the FlexMatrix. This complicates the task of incorporating efficient BLAS operations during the Sigma-vector algorithm. The algorithm implemented in ICE is similar to that published by Povill et al. 72 in the 90's and we refer to the reader to that paper for a detailed description. Here we shall only provide an outline of the algorithm. The calculation of the σ =Ĥ j Ψi vector is divided into three parts, first and second parts given by σ αα 1 and σ ββ 2 consist of the αα and ββ contributions and a third part, which is the most expensive, σ αβ 3 consists of the αβ contribution as shown in Equation (57). The algorithm for the αα and ββ parts are very similar and given in Algorithms 2 and 3 contains the procedure for the σ αβ 3 part which is usually the most expensive.
where the three parts σ αα 1 , σ ββ 2 , and σ αβ 2 are defined as follows: The final algorithm for the ICE is now discussed in the next section followed by the scaling analysis.

| Iteration and convergence conditions
The Final algorithm of the tree-based ICE approach is given in The prefactor C is of the order of 10 3 and depends on the ratio of number of electrons N el and number of MOs N MO and therefore is system dependent.
The scaling of each step with the total number of molecular orbitals (N MO ) is shown in Note: The number of wavefunction parameters for the DET, CFG, and CSF are given by N DET , N CFG , and N CSF respectively. The DMRG bond-dimension is given by M. The pre-factors T FS and T FSD and the various types of data structures are described in the text. Here N α=β gen refers to the α or β generators and similarly for N β=α var refers to the β or α determinants in the variational space. Abbreviations: CFG, configuration; CSF, configuration state function; FOIS, first order interacting space; ICE, iterative configuration expansion; MPBF, many-particle basis function.
1. The scaling of the algorithms that use the RI approach is at least N 2 MO times more expensive than those without it. Use of the RI can therefore result in a N 6 MO dependence of the algorithm on the number of orbitals. This can be prohibitive with systems with a large number of unoccupied orbitals.
2. Note that although the scaling for the RI approach is N 2 MO larger, the large prefactor C in front of T FSD means that the value of N MO for which the cross-over occurs is of the order of N MO ≈ 30.
3. The algorithm without the RI approach in the CSF based ICE and DET based ICE approach shows a similar scaling with N MO as the DMRG algorithm. In both cases the algorithm scales as N 4 MO with the MO size.
The memory consumption during the calculation is also dependent on the ICE variant used. The Table 4 also describes the most important data structures stored in memory that are necessary for the algorithm.
The basic data structure present in all three variants is the Tree (DET, CFG or CSF) that holds the variational space (CITree). Note that in the DET-ICE case, the Tree implies the α and β Tree together. The memory required for the storage of the tree is dependent on the chemistry of the molecule and the thresholds used. In case of T gen = 0 and T var =0, the Tree attains its maximal size of 2 NMO + 1 −1 × M node , where M node is the amount of memory required to store one node of the tree. The DET-ICE additionally requires the storage of the FlexMatrix data structure, described in Sections 3.1 and 4.1, the size of which is problem dependent and maximally can reach a size of a matrix of dimension (N α × N β ) where N α and N β are the number α and β determinants respectively. For the CFG-, and CSF-ICE, additional to the Tree that contains the variational parameters, a second tree containing the generated singly and doubly excited MPBFs called the SDTree is required during the perturbative selection step. This tree can become significantly larger than the variational Tree in some cases. However, its size decreases with a decrease in the total number of new generators as the iteration proceeds. Finally, for the CSF-ICE where the RI algorithm is used during the Sigma-vector calculation, the Tree required to hold the RI space (RITree) needs to be stored in memory. The size of the RITree is essentially related to the FOIS of the variational space and can become large for cases where N el ≈ N MO /2. In such cases, the version without RI (NoRI) can be an efficient alternative.

| CONCLUSION
In this paper, we have described and discussed our ICE algorithm for DET, CFG, and CSF basis representations. The emphasis throughout the paper is on simplicity and clarity. The algorithm and the implementation of such complex representations such as CFGs and CSFs can sometimes be exceedingly convoluted. Here we have shown by using the notion of a configuration tree, how part of the complexity can be reduced. We have also compared the similarities and differences in the three types of MPBFs DET, CFG and CSFs. The main points concerning the implementation of DET-, CFG-, and CSF-ICE are summarized as follows.
1. The basic algorithm for ICE is the same for the DET, CFG, and CSF MPBF. With the exception that the DET representation requires additional care in the initial step to ensure that the wavefunction remains a spin eigenfunction.
2. The algorithms for the calculation of CCs for the DET, CFG, and CSF MPBF are quite different and have their own strengths and weaknesses. The DET-ICE has the simplest form of CCs whereas the calculation of one-, and two-particle CCs for the CSF-ICE can be quite complicated. The calculation of CCs for the CFG-ICE is simpler compared to the CSF-ICE, however, it necessitates a prototyping scheme for an efficient implementation.
3. The Sigma-vector step for the CFG-ICE can be formulated to use extremely efficient Level 3 BLAS operations ensuring a fast algorithm. However, the Sigma-vector step for the DET-ICE and CSF-ICE does not render itself to such a formulation via matrix-matrix product operations. This is the major obstacle for obtaining an efficient algorithm in these cases.
4. The memory footprint for the CFG-ICE is the largest, however, an efficient prototyping scheme can somewhat reduce this memory requirement. Similarly, the DET-ICE requires memory storage for bookkeeping of the connected α,β pairs. The CSF-ICE on the other hand, has no requirement of additional buffers and data structures and therefore is the most frugal in memory terms. Consequently, it is also the algorithm that is best adapted to a massively parallel implementation.
5. The scaling with respect to the number of molecular orbitals for the CSF-ICE (without the RI) is the same as that of the standard DMRG algorithm. This makes it possible for the use of ICE for all kinds of applications involving small and large systems with and without spin symmetry.
In our opinion, the tree-based construction provides an elegant solution to efficiently construct and store very incomplete configuration spaces such as they arise in selected CI approaches. By construction, the tree consumes considerably less memory than N CI × N MO or N CI × Nel (N CI = size of the CI space, N MO = number of orbitals, Nel = number of electrons) since each node contributes to a potentially large number of CSFs (CFGs, DETs). The tree representation leads to extremely efficient recursive search algorithms, in which all CSF (CFGs, DETs) that interact through the Hamilton (in principle any Hamiltonian) with a given CSF (DET, CFG) can be obtained. In fact, the search length is at most N MO , which (depending on the exact nature of the CI space), is typically many orders of magnitude less than N CI . At the same time, the one recursive search leads to the automatic construction of all necessary coupling coefficients and hence to an efficient construction of the Sigma-vector in iterative CI procedures. However, while we have not yet carried out a detailed comparison to existing Full CI algorithms, we point out that an algorithm that is as general as ours cannot be expected to outcompete algorithms that were specifically constructed for Full CI spaces. The strength of the tree-based approach is rather that it provides a compact, efficient, and extremely flexible approach for incomplete CI spaces.
In the forthcoming papers, we shall provide benchmarking analysis, extrapolation schemes, and detailed case studies. Through such an analysis we shall demonstrate the strengths and weaknesses of the DET-, CFG-, and CSF-ICE for the different problems relevant to quantum chemistry.