Advances in vibrational configuration interaction theory ‐ part 2: Fast screening of the correlation space

For larger molecules, the computational demands of configuration selective vibrational configuration interaction theory (cs‐VCI) are usually dominated by the configuration selection process, which commonly is based on second order vibrational Møller‐Plesset perturbation (VMP2) theory. Here we present two techniques, which lead to substantial accelerations of such calculations while retaining the desired high accuracy of the final results. The first one introduces the concept of configuration classes, which allows for a highly efficient exploitation of the analogs of the Slater‐Condon rules in vibrational structure calculations with large correlation spaces. The second approach uses a VMP2 like vector for augmenting the targeted vibrational wavefunction within the selection of configurations and thus avoids any intermediate diagonalization steps. The underlying theory is outlined and benchmark calculations are provided for highly correlated vibrational states of several molecules.


| INTRODUCTION
Vibrational configuration interaction (VCI) theory allows for the accurate calculation of state energies, [1][2][3][4][5] but usually on cost of considerable computational demands. In part 1 of this article series (cf. Ref. [6]) we reported about the acceleration of configuration-selective VCI calculations arising from a rigorous exploitation of the antisymmetry of the spectroscopic ζ-constants, the analogs of the Slater-Condon rules in finite basis vibrational structure theory and the introduction of subspace diagonalizations for obtaining physically meaningful and reliable start vectors for the iterative determination of vibrational eigenstates. These concepts led to considerable speed-ups with respect to the overall computation times and thus paved the way for even larger VCI calculations, which allow for highly accurate state determinations. In the second part of this series, even further accelerations are presented originating from two technical refinements concerning mainly the configuration selection process. The first one introduces the concept of configuration classes, which allows to process whole blocks of configurations instead of individual ones. This results in a significantly enhanced screening of the correlation space and can be applied to both, the configuration selection process and the evaluation of the VCI matrix. The second technique exploits state vectors obtained from an expression related to second order vibrational Møller-Plesset perturbation (VMP2) theory, which replace VCI state vectors and thus allow to avoid intermediate diagonalizations in the iterative selection process. The benefits of this approach arise mainly for highly correlated vibrational states-as observed in regions of high-state densities, which require many configurations and thus expensive diagonalizations for their proper representation. In order to be consistent with respect to the first part of this article series, the same set benchmark molecules, namely B 2 H 6 , C 3 H 4 , and C 2 H 5 F, or, more precisely, the same set of potential energy surfaces (PES) has been used, which rely on n-mode expansions being truncated after the 4-mode coupling terms. However, we have augmented this set of molecules by the quasi-linear HCCNCS molecule, which shows a Champagne bottle potential with strong quartic contributions. This system, whose potential has also been truncated after the 4-mode coupling terms, shows exceptionally long diagonalization times and thus behaves differently than the other three systems. Details about these have been presented in separate publications, cf. Refs. [7][8][9] In addition, we also present benchmarks obtained by applying the refinements presented in both parts of this article series in order to demonstrate the overall performance, which can be achieved in calculations aiming at high accuracy.
While the first technique presented in this contribution here can be rigorously applied within any implementation of VCI theory, the second aspect is more specific and requires a VMP2 like selection criterion. We explicitly note here that different approaches for the selection of configurations have been presented in the literature, [10][11][12] which all show some pros and cons. However, the VMP2 like criterion is applicable to any type of wavefunctions and potentials and is used in several implementations. 13,14 2 | VCI PROGRAM STRUCTURE -CONFIGURATION SELECTION In the following, we briefly outline aspects of our VCI implementation being relevant for a general understanding, which is indispensable for revealing the computational bottlenecks. For additional and detailed information concerning our VCI algorithm, we refer to Refs. [15][16][17].
In order to be able to handle large systems, we employ an iterative configuration selective VCI algorithm based on the commonly used Watson-Hamiltonian. 18 The VCI wavefunction j Ψ VCI i is given as a linear combination of Hartree products that is, the wavefunction is expanded in terms of singly (S), doubly (D), triply (T), Á Á Á excited configurations j Φi regarding the reference j Φ 0 i, which corresponds to the vibrational self-consistent field (VSCF) solution. The configurations themselves may be either products of onedimensional harmonic oscillator functions, or, as used in this work, one-dimensional functions (modals) obtained by solving the VSCF equations. The n i -th modal in coordinate q i will be denoted by φ ni i in the following, that is, a configuration is given as j Φi ¼j Q i φ ni i i. Note, that in our implementation of VCI, we exclusively use real basis functions. For further information about handling non-Abelian systems in this workaround see Ref. [17].
Initially a correlation space will be generated being restricted by (a) the number of excitations within a single mode, (b) the maximal number of modes being excited and (c) the total number of quanta within the configuration. In order to subsequently reduce the initial configuration space, we apply the aforementioned VMP2-based selection criterion 19,15 for the purpose of selecting configurations that are supposed to have a non-negligible contribution to the total energy of the state of interest.
If ε a ð Þ AJ is larger than a certain threshold, the tested configuration j Φ J i will be included in the configuration space, otherwise not. The VCI wavefunction of a state A in the a -th iteration step is thus given as with a f g being the set of selected configurations until the ath iteration. Within the configuration selection the reference state, cf. Equation (3), is chosen to be the eigenstate of the last iteration step having the largest overlap with the harmonic counterpart. 20,17 This ensures that the configuration space is appropriately chosen to describe the target state. Note that, in the first iteration step, the VSCF reference configuration (in case of molecules belonging to Abelian point groups and canonical normal coordinates) or a wavefunction in a meaningful subspace is used -for details see Ref. [6]. Subsequently, the correlation space is iteratively increased until convergence by using Equation (2). In order to determine the respective eigenvalue of the intermediate VCI matrix, we employ an iterative eigenvalue solver based on the RACE-algorithm, 21 which minimizes the residual norm. This algorithm has been shown to surpass the performance of the commonly used Jacobi-Davidson algorithm. Within the selection iterations, it has to be guaranteed that the correct state is tracked, which is realized by a physically meaningful start vector. 17 Typically, many of the coefficients c a ð Þ AK of the eigenvector in Equation (3) are very small. In order to reduce the computational effort within the configuration selection via Equation (2), which includes a sum over all K ∈ a f g, we adjust the eigenvector for insignificant coefficients, that is, configurations belonging to very small coefficients are skipped in the selection process. Note, that the concerning configurations are not removed from the current correlation space, but they are simply not considered in Equation (2)

| Method and implementation
There are two computationally demanding parts within the program structure, in which the calculation of matrix elements is needed, namely the set-up of the VCI matrix and the configuration selection step. Both parts suffer from large configuration spaces. Quantitatively, the scaling is the following: • During the configuration selection, all configurations not included in the current configuration space need to be tested regarding their energy contribution via the VMP2-like energy expression (2). The upper bound of the sum within Equation (2)  Obviously, in order to reduce the number of elements, the analogs of the Slater-Condon rules, which exploit orthogonality of the basis functions with respect to the order of the operator considered, can be applied to sort out vanishing integrals without explicitly calculating them. Note that, this corresponds to the concept of active and passive terms in vibrational coupled-cluster theory. 22 Nevertheless, a large computational effort remains, especially within the configuration selection process due to the scaling with N conf,tot: . As shown below, by grouping configurations having the same properties regarding the reference configuration and driving the loop structure by these "classes" instead of single configurations significant reductions in CPU time can be achieved.
The VCI wavefunction in a specific iteration step is expanded in terms of configurations according to Equation (3). Technically, within our configuration selective implementation of VCI, we employ two lists, namely the i. list of (binary coded) configurations in the initial configuration space, which is used in order to determine the energetic contribution of a single configuration via Equation (2), and ii. the list of configurations already selected via Equation (2) used in order to generate the VCI wavefunction (cf. Equation (3)) and the VCI matrix and to apply the VMP2-like criterion again in the next iteration step.
In order to maximally exploit the analogs of the Slater-Condon rules to gain a smaller number of matrix elements to be evaluated in (a) the matrix set-up and (b) the configuration selection, we presort the configurations in the initial configuration space. Subsequently, we combine certain configurations into blocks, the aforementioned (configuration) classes. The criterion for sorting is the following: Let us consider the set of modes In these modes, the two configurations j Φ I i and j Φ J i differ in the respective quantum numbers. All configurations j Φ K i sharing the same with respect to the reference configuration j Φ 0 i are elements of the same class. The reason is obvious: When building the VCI matrix and especially within the configuration selection via Equation (2), the respective integral vanishes if for two configurations j Φ I i

| Configuration selection
Let K tot , j K tot j¼ N conf,tot: be the set of configurations within the initial configuration space and K sel a ð Þ the set of configurations that are included in the configuration space in iteration step a, j K sel a ð Þj¼ N conf,sel: a ð Þ. Generally, using the analogs of the Slater-Condon rules, the VMP2-like criterion (2) can be rewritten as with P being the maximum order of the coupling terms present in the PES and J ∈ K tot nK sel a ð Þ. Note that H 0 does not include any VAM contributions. Equation (4) refers to our previous implementation not using the concept of classes introduced here, but application of the Slater-Condon rules to each configuration separately. All reference CPU times shown in the following are based on Equation (4).
We now mathematically introduce (configuration) classes. As a class C Mm we define a set of configurations j Φ I i sharing the With the definition (5), the sets K sel a ð Þ and K tot nK sel a ð Þ can be written as the unions and with N C,sel a ð Þ and N C,nsel a ð Þ being the number of classes in the set of the selected configurations in iteration step a and the non-selected ones, respectively. Employing the concept of configuration classes via Equations (6) and (7), we can rewrite Equation (4) as Note, that a specific element ε aþ1 ð Þ AJ is only calculated in the case that with τ ∈ C sel Mm and ρ ∈ C nsel Mm . Subsequently, we generate a list of the contributing pairs and calculate the elements from these classes.
Thus, regarding efficiency, there are two aspects: First, the summation over the number of configurations in the current configuration space in iteration step a is replaced by the much smaller one over the number of configuration classes. Only in the case that a configuration class renders a non-zero contribution according to the Slater-Condon rules, the matrix element belonging to the concerning class is explicitly evaluated. Second, we screen the classes of the remaining nonselected configurations. Both together, a loop structure consisting of two large loops is broken down to four smaller ones. It is important to notice, that the initial configuration space can be used much more efficiently for screening via the Slater-Condon rules, as the classes are much larger than in the list of selected configurations.

| Set-up of the VCI matrix
For building the VCI matrix, the same technique as described for the configuration selection is applied, but for this case only set (6) is relevant. As described before, in every iteration we solely calculate the missing matrix elements regarding the configurations of the last step.
Thus, the list of new configurations is, compared to the number of elements occurring with the configuration selection, relatively small. Therefore, there are less elements in a specific class and the resulting computational saving has to be expected to be significantly smaller.

| Results
In order to demonstrate the CPU timesavings arising from configura- Since the impact of classes strongly depends on the order of the operator, we performed different sets of tests: Regarding the computational cost for evaluating the VCI matrix, we considered potential energy surfaces up to 3-and 4-mode couplings, whereas VAM terms were entirely neglected or included up to 0D. Note that, within the configuration selection according to Equation (2), VAM terms are always neglected, but are considered within the (intermediate) VCI matrix setup. As restrictions for the initial configuration spaces, we used a maximum sum of quantum numbers of 15 with at most six modes being simultaneously excited. For allene, a maximum excitation of seven per mode has been utilized, six for B 2 H 6 and five for C 2 H 5 F. As reference, CPU times without using classes have been determined.
As can been seen in Figure 1, the computational savings are much larger for the configuration selection process (last two groups of bars on the rhs) as for the matrix set-up (4 bar groups on the lhs). The red bars depict the reference calculation without classes, the blue ones show the CPU times employing this new concept. Note, that the scaling for the first y-axis is logarithmic. The lines refer to the second y-axis on the right hand side and provide the mean and total saving instead.

| Configuration selection
Since the VAM contributions are not considered within Equation (2), the inclusion of the VAM terms show only an indirect effect on the CPU time needed for the configuration selection. Figure 1 clearly illustrates, that merging configurations into classes instead of checking individual configurations leads to a tremendous computational advantage.
Naturally, using a 3D potential energy surface, which includes couplings up to three modes involved, leads to larger savings than employing a 4D PES, because many more integrals will vanish due to the analogs of the Slater-Condon rules. Additionally, driving the configuration selection by classes leads to larger blocks of configurations that can be skipped in advance without evaluating the respective matrix element. Moreover, considering a specific PES, the effect of saving CPU time by using classes grows even larger if the size of (a) the systems and/or (b) the correlation space increases. Both in turn leads to larger blocks that can be neglected and therefore the total CPU time decreases.
We like to emphasize here that the ordering of the configurations in the initial correlation space plays a major role regarding the savings.
Expression (4) has to be evaluated for all configurations Φ J not being included in the current configuration space, which may be a large number (i.e., N conf,tot: À N conf,sel: a ð Þ). By using the ordering described above, the initial configuration space can be divided into a minimal number of classes, whereas a single class has maximum size. For example, within the calculation of the vibrational ground state of allene, there are 30,926,490 configurations present in the initial F I G U R E 1 Comparison of CPU times for building the VCI matrix (abbreviated as "mat.") and determining the correlation space using the VMP2-like criterion (abbreviated as "sel.") in the cases of (A) using the concept of configuration classes (blue bars) and (B) without (red bars The other systems investigated (see Table 1), show qualitatively the same behavior. On average (all three systems considered), savings are as large as 66% in the first case and about 12% in the three other ones. In any case, the savings are much smaller than for the configuration selection procedure. Essentially, this behavior results from the technical framework within our implementation. The efficiency of our VCI algorithm benefits from a specific ordering of the selected configurations from the previous iteration step, that is, the configuration list is sorted by iterations. In this way, we are able to reuse information and transfer it to the next iteration step in order to avoid recalculation to a great extent. Within the list of already selected configurations, we conserve the original sorting, which renders the most efficient structure regarding classes, only within configurations from the same iteration step. Thus, the list is split into many sections. Since the VCI matrix is generated from already selected configurations, the resulting loop leads to much smaller classes that may be skipped compared to the case of configuration selection. Although the savings are comparatively moderate, for larger systems the benefit from the alternative loop structure (classes vs. no classes) will increase and the savings will grow larger. It is important to notice here, if the VCI implementation is non-iterative and/or the VCI matrix would be generated as a whole from a list ordered in the optimal way, the savings would be significantly larger. In general, the computational savings that can be achieved by using classes increase with a growing number of modes and/or the size of the correlation space.

| ELIMINATION OF INTERMEDIATE EIGENPAIR DETERMINATIONS
The computational effort within configuration selective VCI calculations is dominated by the last few iteration steps, when the VCI matrix has already gained a certain size. Consequently, it must be the primary goal to reduce the CPU time in these last steps and thus we will focus in the following on two situations: • Although the method of prediagonalizing subspaces presented in  Table S2 in the supporting information), 24 and 18 iterations within our (former) algorithm are necessary to reach converged energy eigenvalues. Of course, this is an indication for small numerical effects playing a major role for such critical states, since the main physically relevant information is already covered by the first iteration steps. Thus, small uncertainties within the dynamic correlation may sum up during the iteration process leading to a large number of steps.
• Usually, the configuration selection itself dominates the total CPU time of the calculation (see for example Figure 1).

| Method and implementation
In order to save CPU time in case of computationally demanding eigenvalue determinations and to further improve the convergence behavior of our algorithm, we modified the criterion for the configuration selection. Note, that the starting point for the following considerations again is Equation (2) the corresponding wavefunction Ψ 1,n is In order to estimate the (energy) correction an arbitrary configuration contributes to the total energy of the state jΨ A i, the basis function jΨ 0 ð Þ n i in Equation (10) is replaced by the wavefunction jΨ a ð Þ A i in the iteration step a given by Equation (3). This yields the VMP2-like energy expression (2) which is used to decide whether a certain configuration should be included in the correlation space or not. The energy value, cf. Equation (2), itself does not provide a physically meaningful energy correction in the sense of perturbation theory.
In analogy to this, we define a corresponding VMP2-like wavefunction of first order jΨ obtained by one-to-one comparison with the wavefunction (11). We want to emphasize here, that the requirements of perturbation theory formally are not fulfilled for the wavefunction jΨ The existing configuration selection scheme is now modified as follows: 1. Initially, the configuration selection via criterion (2) is used.
2. If the difference between the energy eigenvalues obtained from two consecutive iteration steps falls below a certain threshold E diff,thres , we replace criterion (2) by with the VMP2-like wavefunction (12) and ε a ð Þ,1 A being the respective energy.
3. We define the correlation space to be converged, if the sum of "energy corrections" (13) is smaller than a certain threshold E corr,thres .
4. Finally, the VCI matrix in the converged configuration space is diagonalized and the state of interest is identified.
It is important to notice, that the criterion (13) is only employed once the major part of physical information is already covered by the current correlation space, that is, static correlation. This is ensured by the threshold regarding the energy difference of two consecutive iteration steps. In the last iterations the main concern is to achieve conver- Note, that the norm of the wavefunction (12) is given as  (3) were found to be very small. For this reason and in order to reduce the computational effort within the configuration selection, in the original algorithm these insignificant coefficients were set to zero. By replacing the genuine VCI wavefunction (3) by the VMP2-like expression (12), this adjustment is not possible any more. Consequently, the computational cost for the configuration selection rises, but is overcompensated by using the proposed method.

| Results
We benchmarked the method of employing a VMP2-like wavefunction (12) within the configuration selection for all test molecules presented above. In all cases, the PESs have been truncated after the 4-mode coupling terms and 0D VAM contributions have been included within the VCI calculations. In order to restrict the initial configuration space, we used n ex,init: ¼ 6, n max,init: ¼ 6, n sum,init: ¼ 15 for B 2 H 6 , C 3 H 4 , and C 2 H 5 F, for HCCNCS we utilized n ex,init: ¼ 5, n max,init: ¼ 5 and n sum,init: ¼ 15.
CPU timesavings regarding the total computational time required are shown in Table 3. The reference calculation refers to our former VCI implementation using Equation (2) for configuration selection only, the CPU saving refers to the use of the criterion (13) when E diff,thres < 2:0 cm À1 . The threshold for convergence has been set to E corr,thres ¼ 1:0 cm À1 . In order to obtain the statistical data presented in the table, we calculated the six BH-stretching (CH-stretching) fundamental transitions of B 2 H 6 (C 2 H 5 F), the four CH-stretching fundamental transitions of C 3 H 4 as well as its first overtones 2ν 6 , 2ν 2 and 2ν 7 and the four highest lying fundamentals of HCCNCS. Table 3, on average, the mean absolute energy difference between the results obtained with our former algorithm and the new one employing Equation (13) is 0.3 cm À1 and the maximum deviation is 1.1 cm À1 . These results show that the final energy eigenvalues obtained by the new method described do not differ significantly from the results generated within the former algorithm. Since deviations of these magnitudes can arise from many error sources within the entire calculation (quality of the potential energy surface, quality of the polynomial fit of the PES, size of the correlation space in VCI, choice of startvector for the diagonalization, thresholds for convergence, Á Á Á) the error must be considered to be small. This behavior had to be expected, because, as mentioned before, the main effect of the late iterations (regarding E diff,thres ) is to fine-tune the correlation space whereas all relevant physical information about the state of interest is already covered within the early ones. This refers to the concept of static and dynamic correlation in electronic structure theory. Consequently, the actual energy eigenvalue will not essentially depend on individual configurations, that is, the difference between using Equation (2) or (13) instead will not be very large.

As shown in
On the other hand, using this technique the total CPU time can be tremendously reduced. The data given in Table 3 show, that using the modified algorithm on average leads to total CPU time savings of 61.9% for the respective states and an average mean CPU time saving of 54.5% per state. Regarding the single systems considered, the mean saving per state varies between 31.5% for B 2 H 6 and 73.9% for C 2 H 5 F, that is, there may be large differences regarding the possible savings, which depend on the behavior of the systems considered within the calculation. In the following, we will discuss different situations exemplified by our benchmarks molecules.

| C 3 H 4
A detailed depiction of the computational demands for the vibrational states of allene calculated is shown in Figure 2 Tables S1-S4 in the supporting information). Thus, the CPU savings are generated by improving the convergence behavior, because the total CPU time is dominated by the configuration selection while the diagonalization steps amount to a minor contribution. Additionally, the final correlation spaces are usually smaller employing our new T A B L E 3 CPU savings (w.r.t. total computational time) and energy deviations arising from using the VMP2-like vector (12) instead of the VCI eigenvector (3) obtained by diagonalization of the intermediate VCI matrix. The configuration selection has been carried out based on Equation (2) and E diff,thres > 2:0 cm À1 . Subsequently, for further augmentation of the correlation space equation (13) has been used (E corr,thres ¼ 1:0 cm À1 ). For details regarding single states see the supporting information algorithm. For example, regarding the state ν 1 the configuration space is reduced to almost 50%, whereas the energy difference with respect to the reference value is 0.5 cm À1 only.
Obviously, state ν 5 constitutes an exception from this overall tendency. Since the number of iterations stays unchanged in this case and the diagonalization does not dominate the total CPU time, we lose performance by using Equation (13), because negligible configurations cannot be discarded in this formalism. This is in contrast to our former implementation based on Equation (2) (2). For every state shown, the bar on the left hand side depicts the CPU time for a reference calculation using the VCI eigenvector obtained by diagonalization of the VCI matrix in order to apply Equation (2) for selecting configurations. The bar on the right hand side refers to an implementation using a VMP2-like vector of the form (12) instead as long as the thresholds described in the text are reached. The corresponding differences between the energies obtained are given above the bars, the dimensions of the associated correlation spaces are depicted in red. The second y-axis on the right hand side provides the number of iterations required (see Figure 3,

| OVERALL PERFORMANCE
Here we combine the methods to accelerate iterative VCI calculations based on a configuration selection scheme presented in both parts of this articles series. We modified the following steps within the algo-  (2) for selecting configurations. The bar on the right hand side refers to an implementation using a VMP2-like vector of the form (12) instead as long as the thresholds described in the text are reached. The corresponding differences between the energies obtained are given above the bars, the dimensions of the associated correlation spaces are depicted in red. The second y-axis on the right hand side provides the number of iterations required has been restricted by n max,init: ¼ 6, n sum,init: ¼ 15 and n ex,init: ¼ 6 5 ð Þ for B 2 H 6 , C 3 H 4 , (C 2 H 5 F). In order to define the application of the criterion (13) instead of Equation (2) for configuration selection, we used the thresholds E corr,thres ¼ 1:0 cm À1 and E diff,thres < 2:0 cm À1 . For all systems investigated, the CH (BH)-stretching fundamentals have been evaluated. The respective results can be found in Table 4.
In summary, the mean CPU time saving is 89.8% for B 2 H 6 , 86.8% for C 3 H 4 , and 94.7% for C 2 H 5 F, that is, the performance of the algorithm has been increased tremendously. We will discuss the results exemplified by allene in the following.
In Figure 4, the CPU times and savings for the calculation of the y-axis on the lhs is logarithmic and shows the total CPU time for the calculation of the individual states. It can clearly be seen that our new implementation unifying the optimization techniques presented in both parts of this article series leads to significant CPU timesavings.
We explicitly emphasize here, that Figure 4 shows total CPU times and not single steps within the calculation. For allene, the mean CPU time saving is 86.8% with respect to our former implementation. On the other hand, as shown in Figure 4, the results for the energy eigenvalues match with former results, that is, there is almost no loss of accuracy due to the optimizations.
The total CPU time saving is a sum of the following aspects: With respect to our former implementation, we gain a factor of (i) 12. Consequently, the optimizations presented in this work lead to a substantial increase of the performance of our VCI algorithm. Since the saving nearly constitutes a whole order of magnitude regarding the necessary computational time, these improvements make larger and more complicated systems than before accessible. Of course, also rovibrational calculations will profit essentially from a faster and more stable VCI implementation.

| SUMMARY
Accurate VCI calculations suffer from a significant computational effort increasing with the size of the molecule of interest. Much work has already been devoted to tackle this problem. In this work, we presented four new technical aspects leading to a significant reduction of the computational cost for configuration selective VCI calculations: i. In part 1 of this article series (cf. Ref. [6]) we provided analytical unrolled equations leading to a fast and efficient evaluation of the vibrational angular momentum terms. The scaling of the computational effort with the number of modes has been reduced by at least one order, some expressions are even independent of the size of the system. Thus, the computational effort has been reduced essentially. Roughly, for the evaluation of zeroth order terms, we gain a factor 10 within the set-up of the VCI matrix with regard to our former, but already optimized implementation.
ii. Also in part 1 (cf. Ref. [6]) we presented an improvement of the convergence behavior within the iteration by defining appropriate subspaces of configurations and prediagonalize them.
Combined with our state picking scheme, the obtained eigenvector is used to take into account resonance information from the very first iteration step, which improves the behavior of the algorithm regarding convergence. Approximately, this modification generates a saving of 20 % of the total CPU time.
iii. Within this article here, we introduced so-called classes of configurations in order to maximally exploit the Slater-Condon type rules in a more efficient manner. In particular, the configuration selection can be speeded up substantially, because the number of matrix elements to be evaluated during the iterations is significantly reduced. This leads to a considerably increased efficiency of the algorithm, that is, we roughly gain a factor of 6 within the configuration selection process (which dominates the total CPU time in most cases).
iv. We modified our former configuration selection scheme utilizing Open access funding enabled and organized by Projekt DEAL.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.