Single‐reference coupled cluster methods for computing excitation energies in large molecules: The efficiency and accuracy of approximations

While methodological developments in the last decade made it possible to compute coupled cluster (CC) energies including excitations up to a perturbative triples correction for molecules containing several hundred atoms, a similar breakthrough has not yet been reported for excited state computations. Accurate CC methods for excited states are still expensive, although some promising candidates for an efficient and accurate excited state CC method have emerged recently. This review examines the various approximation schemes with particular emphasis on their performance for excitation energies and summarizes the best state‐of‐the‐art results which may pave the way for a robust excited state method applicable to molecules of hundreds of atoms. Among these, special attention will be given to exploiting the techniques of similarity transformation, perturbative approximations as well as integral decomposition, local and embedding techniques within the equation of motion CC framework.

considerably slower. Although benchmark studies on small molecules often use the iterative approximate CC singles, doubles and triples (CC3) method 4 as a reference due to the fact that it describes the effect of triples on excitation energies (EEs) very well, carrying out even a lower scaling EOM-CCSD calculation still remains a challenge for large systems. Thus, for the purposes of this review, we will focus our attention on techniques that allow for accurate calculations to be carried out at a lower cost than canonical EOM-CCSD. The latter method already provides a high level of accuracy and making it available for computing the properties of molecules containing a hundred atoms would be a breakthrough in itself. However, this does not mean that triples are unimportant for excited states, merely that current state-of-the-art excited state CC implementations that aim for efficiency strive to reach a more modest goal. Indeed, since excited states are described by single excitations to zeroth order, correlation effects enter only with higher order excitations. This already indicates the importance of triples. 5 While it is expected that EOM-CCSD performs worse for excited states with a significant double excitation character than it does for singe excitation dominated states, 6,7 the early work of Bartlett and Watts 8 showed that EOM-CCSD may already be quite inaccurate for some single valence excitation dominated states of polyenes, 9 benzene, and heterocyclic compounds. 10 In all these cases, the EOM-CCSDT method of Bartlett and coworkers and several approximations to it 5 perform better than EOM-CCSD. The importance of triples is also reinforced by benchmark studies using the Thiel test set 11 and is emphasized again in a more recent overview by Bartlett on the hierarchy of CC and EOM methods. 12 Nevertheless, the present review will be limited in scope to methods with a better scaling than EOM-CCSD, since such methods have been the main focus of research in recent years. Furthermore, since we will be chiefly concerned with computing EEs, we need not distinguish between response and EOM formulations, and we will sometimes simply refer to EOM-CCSD as (excited state) CCSD.
In a review from 1977, Monkhorst 13 discussed the theory behind CC property calculations, anticipating later formulations of response theory, including Dalgaard and Monkhorst,14 and the standard coupled cluster linear response (CCLR) implementation of Koch and Jørgensen. 15 In the same year, Harris summarized the prospects of computing excited states using CC theory. 16 Assuming that the ground and excited state operatorsT g andT x , define an operatorŜ =T g −T x , he expressed the excited state wavefunction through the action of the exponential operator eŜ on the ground state CC solution. This ansatz has been abandoned in favor of the approach of Emrich from 1981 which uses a linear excitation operator to define the excited states, 17 and the origins of the EOM formulation within the CC framework are usually traced back here. Nakatsuji and coworkers also reported the extension of the conceptually similar symmetry-adapted cluster configuration interaction (SAC-CI) approach to excited states around that time. 18, 19 Sekino and Bartlett 20 and especially Stanton and Bartlett 21 describe the standard implementation of the EOM-CCSD equations. For open shells, the equations derived by Stanton and Bartlett can be used with restricted and unrestricted Hartree-Fock (HF) reference functions for closed and open shell systems, and as the work of Szalay and Gauss shows, it can also be ensured explicitly that the EOM eigenstates have the correct expectation values for the square of the total spin operator. 22 The spin-flip formulation of Krylov 23,24 is a generalization for open shell molecules, which essentially consists of removing an alpha electron from the system and adding a beta electron to it in all possible configurations and also considering spin-conserving excitations on the resulting functions up to the desired truncation level. While regular EOM-CCSD is capable of describing multiconfigurational excited states so long as the ground state is dominated by a single determinant, the spin-flip approach extends the applicability of EOM methods to a wider range of multireference systems and phenomena such as bond breaking. For further details on the EOM framework, the reader is referred to the appropriate chapters of the book of Shavitt and Bartlett. 25 Since we will focus here on the nonrelativistic closed shell single reference case, the recent review of Krylov on open shell calculations 26 and that of Lischka et al. on multireference excited state calculations 27 is also recommended. And because the scope of this review is limited to CC theory, the popular density functional theory (DFT) methods, and their extensions to excited states through time-dependent density functional theory (TD-DFT) will not be discussed much either, except in combination with wavefunction methods. The review of Ghosh et al. summarizes the latest results in this respect. 28 While it is not our task here to provide a comprehensive summary of the further development of CCSD for excited states, a few important advances must be mentioned. Although until now, we have been focusing on computing EEs, the same framework also allows for the computation of ionization potentials (IP) as discussed first by Nooijen and Snijders 29 and later by Stanton and Gauss, 30 and for computing electron affinities (EA) as shown by Nooijen and Bartlett. 31 This requires only a simple modification of the working equations. Musiał and Bartlett went on to derive the IP 32 and EA variants 33 of EOM-CC including full triples. The EE/IP/EA approaches were further exploited in a variety of contexts by the group of Krylov and of Piecuch. The efforts of the former include investigating redox potentials 34 and core ionized states, 35 while the latter not only reintroduced some of the novel techniques into nuclear physics, 36 but also developed a Monte Carlo sampling technique to solve the problem of prohibitive scaling arising with higher order excitations in the EOM-CC approach. 37 The work of Köhn and Tajti investigated the performance of EOM for computing conical intersections and proposed a simple modification to avoid the arising problems. 38 Among the more recent advances is the work of Jagau on complex scaling 39 based on the work of Bravaya and Krylov 40 and that of Stopkowitz on properties in strong magnetic fields. 41 We also look forward to a rigorous mathematical treatment of the excited state CC problem by the group of Kvaal in the spirit of their earlier work of the ground state problem 42 and anticipated by the recent study of Pedersen and Kvaal on time-dependent CC theory using the bivariational theorem. 43 An essential feature of CC and EOM is the use of a similarity transformed Hamiltonian. As will be discussed later, this explains many of the properties of EOM-CCSD which are hard to interpret from a response theory point of view. However, similarity transformation is a more general technique. In the context of Fock-space coupled cluster (FSCC) theory, the variant proposed by Stolarczyk and Monkhorst 44,45 uses similarity transformations in such a way that certain blocks of the Hamiltonian become zero. The transformation operator of the form eŜ, whereŜ has noncommuting components, leads to a cumbersome formulation which makes the normal ordered exponential form eŜ n o introduced by Lindgren 46 preferable. This has issues of invertibility and as a result, the connectivity of the transformed Hamiltonian comes into question. Nooijen resolved these issues, and also introduced an active space into the transformation process to reduce the cost. 47 Nooijen and Bartlett then used the resulting similarity transformation technique to decouple the singles from the doubles in the excited state CC problem which allowed for obtaining accurate EE values from diagonalization solely in the space of single excitations. The resulting method is called similarity transformed EOM (STEOM) 48 which also makes use of the fact that the FSCC equations determining the transformation operator are equivalent to the IP/EA-EOM equations, as shown by Mukherjee and coworkers. 49,50 This means that instead of solving a numerically unstable nonlinear problem, solving the IP/EA eigenproblems is sufficient. Thus, the STEOM approach has many potentials to reduce the scaling of the excited state problem, which will be discussed in due course. The EOM-CCSD method provides results of good accuracy for EEs at the prohibitive cost of a formal scaling of O(N 6 ) with some measure of the system size N. The organizing principle of the following discussion will be the way the various methods attempt to reduce this scaling. Modifications to EOM may then be substantive, in the sense that as a result a new method arises, or algorithmic, that is, the resulting method can be reduced to the parent method by setting some parameters to an appropriate limit. We propose to discuss perturbative approximations and similarity transformation techniques in the first group, and integral decomposition, localization, and embedding techniques in the second. These divisions are practical, and as we will see, many overlaps are possible. Indeed, the art of developing an efficient and accurate method lies in combining the appropriate approximations applied at various stages.

| THE EOM-CCSD METHOD
In order to develop the framework of excited state CC theory, we need the notion of a linear excitation operator that removes one or more electrons from occupied orbitals (i,j,k,l,…) and places them on virtual ones (a,b,c,d,…). Here and in the following, these notations correspond to a closed-shell restricted Hartree-Fock (RHF) reference, although the unrestricted Hartree-Fock case can also be obtained by a simple modification of our theory. We will not deal with spin adapted open-shell references, although in principle this is also possible, if a bit more cumbersome. A general linear excitation operatorD is then defined aŝ including a constant term D 0 . Summation is implied between upper and lower repeated indices. The notationÊ I contains the composite index I denoting any operator string belonging to either one of the single (I = S), double (I = D), etc., excitation classes, which can also be denoted more specifically asÊ S Ê a i ,Ê D Ê a iÊ b j , and so on. It will be sometimes convenient to consider excitation operators of a given excitation level, such asD 1 = D SÊ S ,D 2 = D DÊ D , and so on. These can be summed up to yield the total excitation operator, that is, D IÊ I = D SÊ S + D DÊ D + …. Note that for repeated upper and lower composite indices I, summation is also implied over all excitation classes. The spin summed generators of the unitary group (Ê p q ) are defined asÊ for arbitrary orbitals p,q,r,s,… using creation (â p α ) and annihilation (â q α ) operators. Note the significance of covariant and contravariant labels, since an adjoint is defined through index raising and lowering,Ê I À Á † =Ê I .
in whichL l andR k are the left and right solutions and ω k = E k − E 0 is the EE of state k. For excited states, k, l 6 ¼ 0, although this formulation also allows for treating the ground and excited states formally on an equal footing. 51 For the excited state amplitudes (R I k ) of the operatorR k = R 0 k + R I kÊ I , we thus obtain the following matrix equation, with the Jacobian and ω l k δ l k ω l . In response theory, the matrix A is obtained as the derivative of the CC Lagrangian Equation (6) with respect to the parameters inL 0 andT. In EOMCC, the usual way to arrive at these equations is through parameterizing excited states as j Ψ k i =R k eT j 0i, and sinceT andR k commute under the assumptions discussed here, the commutator form in Equation (7) follows from taking the difference between the excited state Schrödinger equation and the ground state Schrödinger equation multiplied from the left byR k and projecting the result to the left states. It should be noted that derivative techniques are also used by EOM practitioners and that perhaps the more relevant distinction is that the EOM approach outlined above allows for a clear definition of the excited state wave function, which is not easily done using response theory. 1 The latter works with the eigenfunctions of the Jacobian, which can also be recovered within the EOM framework. On the other hand, response theory has some advantages in computing various properties. Again, since the focus of this review is on energy calculations, the left states and property calculations in general will not be discussed further here. On the other hand, since we will be dealing with EOM-CCSD in the following, we finish this section by writing down the explicit form of the excitation operator at this level of theory for EEs, for IP,R and for electron attached states,R Here, the spin labels in the creatorâ a and the annihilatorâ i have been suppressed. Usually, i β and a α are assumed, although for closed shells in the absence of a perturbation coupling spin states (e.g., a magnetic field) it does not matter which spin is removed or added.

| Perturbative techniques
For further discussion, the Born-Oppenheimer HamiltonianĤ will be partitioned intoĤ =F +Ŵ, with the Fock operatorF having a diagonal representation in the canonical orbital basis considered here. The fluctuation potentialŴ is the difference between the two body interactionĜ and the effective one body Fock potentialV =Ĵ −K defined through the Coulomb and exchange operatorsĴ andK. We must now investigate how the form of the ground state and excited state CCSD residual equations reflect this partitioning, as this will be central to the various perturbative approaches.
At the matrix level, Löwdin's partitioning technique 52 will also be useful in subsequent discussions. Writing Equation (8) explicitly for a single state in terms of single and double excitations, the doubles contribution may be eliminated from the equation to yield This formulation allows for introducing perturbative approaches in itself by treating the inverted quantity in some approximate fashion, which can also lead to savings in terms of storage and efficiency.
Next, we introduce the notationÕ for an operator similarity transformed using the singles only, and reminding the reader of the Baker-Campbell-Hausdorff (BCH) commutator expansion of similarity transformed operators, in whichÔ and the expansion terminates at N = 4 for a two body operator in the closed shell case. Since H = e −T 2H eT 2 , where we have used the fact that all higher order commutators ofF withT are zero. Using this expression, the energy equation, Equation (3) can be written as while the residuals for single and double excitations can be obtained from Equation (4) as The EOM-CCSD Jacobian matrix A is thus simply equal to the matrixÃ defined by these elements, whose blocks we will refer to asÃ SS ,Ã SD ,Ã DS , andÃ DD . It will be useful to distinguish the case when the singles vanish, that is, whenÕ =Ô for all operators in the above equations. We will denote the corresponding matrix blocks as A SS , A SD , A DS , and A DD . SinceÃ consists of a single term, we will use the alternative notationW S 1 D 2 , denoting the corresponding matrix block asW SD . Finally, the leading terms of the last two blocks also deserve a special notation, with the corresponding matrix elements denoted asW DS and F DD . Again, W SD and W DS denote the matrix blocks without transformation withT 1 , while F DD is the diagonal matrix of orbital energy differences. Figure 1 arranges the various methods to be discussed here in terms of these quantities indicating their relationship as well. At this stage, it is worth noting that Tajti and Szalay discuss many more possibilities of discarding various terms in the Jacobian than the more popular ones mentioned here. 53 One of the simplest approximations possible involves the perturbative expansion of the similarity transformed Hamiltonian.
Once the untransformed HamiltonianĤ is partitioned intoĤ 0 ½ =F andĤ 1 ½ =Ŵ in the way described above, the amplitudes may themselves be expanded in a series and the contributions H n ½ can be identified order by order, Denoting as H n ð Þ the sum of all contributions up to n, the corresponding perturbative approximation to EOM-CCSD is labeled as EOM-CCSD(n) or EOM-MBPT(n), although Stanton and Gauss have reservations using the latter since diagonalization is involved. 54 The working equations were first derived by Nooijen and Snijders 55 for the IP case and were discussed more generally by Stanton and Gauss. 54 For our purposes, the EOM-CCSD(2) method is the important one and as only firstorder amplitudes survive in this case, this amounts to replacing the CC amplitudesT in the EOM-CCSD equations by the second-order Møller-Plesset (MP2) ones. In the EE case, this does not lead to large savings in computational cost, although the terms involving the single T amplitudes do vanish. To indicate the absence ofT 1 , the notation A is used instead ofÃ in Figure 1. On the other hand, the ground state CC problem is still a costly step in EOM-CCSD, although for a sufficiently large number of excited states, the costs of solving the EOM equations will dominate the calculation. However, assuming a single EOM root and the same number of iterations in the ground and excited state parts, the ground state CC problem is the bottleneck and removing it noticeably reduces the prefactor in the resulting method which is still O(N 6 ) scaling. The situation is different for IP and EA calculations, where the overall scaling of EOM-CCSD(2) is reduced to O(N 5 ). Various modifications of the IP scheme within the EOM-CCSD(2) framework were investigated by Dutta, Vaval, and Pal 56,57 to improve the accuracy of this cost-effective method. Here, we are mostly concerned with the EE case. In general, EOM-CCSD(2) is expected to fail whenever MP2 breaks down.
The philosophy behind the CCn methods is quite different. If n is the highest excitation level, only amplitudes at this level are treated perturbatively, the others are not affected. More precisely, CCn retains the lowest order (i) nonvanishing contributions from the amplitudesT i ½ n at the highest excitation level n. For CC2, 58 only the first-order doubles are retained and the SS and SD blocks of the Jacobian remain unchanged, while the lowest orderW and F appear in the DS and DD blocks, see Figure 1. The presence of the diagonal DD block reduces the overall scaling of the method to O(N 5 ) and allows for the usage of the effective Jacobian technique formalized in Equation (17). If beyond the CC2 truncation scheme for doubles, the single amplitudes are also truncated at first order as in EOM-CCSD(2), the SS and SD blocks of the Jacobian are also affected, and the resulting method is the CIS(D ∞ ) approach of Head-Gordon et al., 59 which for these reasons is referred to as MP2-CC2 by Tajti and Szalay. 53 It is possible to apply Equation (17) in a single iteration starting with the configuration interaction singles (CIS) solutions and this yields the noniterative CIS(D) perturbative approach. 60 Head-Gordon and coworkers also considered other approximations to CIS(D ∞ ) by expanding the denominator in Equation (17) in a series to enable the treatment of degenerate cases as well. 59 Although derived from a Green's function-based formulation, the second-order algebraic diagrammatic construction (ADC(2)) approach 61,62 is formally obtained by making the CIS(D ∞ ) Jacobian Hermitian. 63 The symmetric problem is less liable to numerical problems because it does not give rise to complex eigenvalues, and thus, ADC(2) certainly has advantages. A further method that makes use of the propagator formalism is the second-order polarization propagator approach (SOPPA), 64,65 and its noniterative variant, the random phase approximation with doubles correction, RPA(D). 66 The latter are related to orbital optimized approaches 67 and will not be considered further. A useful discussion of second order methods is provided by Hättig, 67 Tajti and Szalay, 53 and on the ADC methods in particular in the review of Dreuw and Wormit. 68 Despite the variety of second-order methods, they all remain formally at least O(N 5 ) scaling without further approximations.
Finally, the partitioned EOM or P-EOM approach 69 can be seen as a perturbative approximation in terms of the excited stateR amplitudes using Löwdin's partitioning technique. The expansion space is partitioned into the singles and doubles, and again, the singles amplitudesR 1 are left unchanged, while obtaining the simplest approximation from Equation (17) requires that the DD block be diagonal. This is simply achieved by replacingÃ DD with F DD , although Bartlett and coworkers also explored other choices in their earlier work. 70 This choice reduces the scaling of the excited state problem to O(N 5 ), but now the problem is the opposite of that in the EOM-CCSD(2) case. While the formal scaling of the excited state equations remains unchanged in EOM-CCSD(2), P-EOM-CCSD leaves the ground state equations unchanged. It therefore makes sense to combine the two approaches into a P-EOM-CCSD(2) method which scales as O(N 5 ) for both the ground and excited states. Again, the singles contributions vanish, as seen in Figure 1.

| Similarity transformation techniques
Similarity transformation is a very general technique, 47 which relates two operatorsÂ andB through the action of a thirdP aŝ In most practical cases of electronic structure theory, the representation of the operatorÂ in a certain basis is known and we can also perform the transformation formally and represent the result in the same basis. To determine the parameters P involved inP, it is usually required that some block IJ of the matrix representation ofB should vanish, As this is a function of P, these parameters can be determined by solving these equations. Conversely, the technique can be used to put selected blocks ofB to zero. For matrices, as long as the transformation is invertible, the eigenvalues ofÂ and B are identical, and their eigenvectors are related by a change of basis defined byP. However, in quantum chemistry, we do not usually projectB to the full subspace, and hence these properties do not hold strictly. Further complications may arise related to the boundedness of operators, as discussed elsewhere. 71 Finally, the use of normal ordered operators simplifies the manipulation of operator strings and although similarity transformation techniques are possible without it, in the following we will assume normal ordered operators throughout.
With these stipulations, similarity transformation still remains generally applicable. One may for example build a representation ofĤ in the basis of Slater determinants constructed using some guess orbitals. In this case, choosing a unitaryP that mixes the ground state and singles and requiring that the 0S block should vanish corresponds to the fulfillment of the Brillouin condition (see the vanishingĤ 0S andĤ S0 in Figure 2), although usually it is formulated differently. One may similarly start from the untransformed HamiltonianĤ and obtain H through the exponential parameterization, as customary in CC theory. In this case, the conditions in Equation (32) are equivalent to Equation (4), that is, we require that the 0S and 0D blocks of H vanish. This defines the usual CC similarity transformation, as shown in the first step of Figure 2. Note also that the integrals of the vanishing 0S and 0D blocks are precisely those that appear in the correlation contribution to E 0 , contracted with the appropriate amplitudes. Furthermore, the fact that H D0 = 0 also means that the only surviving contributions to the TS block of the normal ordered H are small three-particle terms. This explains why EOM is much better for excited states than CI, which excludes a largeĤ TS block from diagonalization when truncated at singles and doubles. It also accounts for the fact that EOM-CCSD is good for single excitation dominated states, but not for those dominated by double excitations, for which the corresponding block of H is still large. 72 These properties of EOM-CCSD are not obvious from a response theory perspective.
Nooijen and Bartlett 48 exploited the similarity transformation technique to decouple the single and double excitations in the Hamiltonian, which also implies that the singles are also decoupled from the triples since a number of other blocks inĜ become small, 72 see Figure 2. It will be the description of the resulting STEOM approach which will occupy the rest of this section. In terms of Löwdin's partitioning scheme, we may say that while second order methods approximate A DD in Equation (16), the similarity transformation aims at eliminating the A DS block without neglecting its effect on the SS block.
Once the singles are decoupled, it will be enough to diagonalize the SS block to determine the EEs of singly excited (and single excitation dominated) states, while at the same time, the effect of doubles is taken care of by the similarity transformation. The STEOM Hamiltonian can be defined asĜ = eŜ where the curly braces {} denote normal ordering with respect to the HF reference state. A technical difficulty presents itself at this stage, since eŜ n o , which makes the evaluation of the inverse problematic. Nooijen and Bartlett circumvent the problem by back substitution in their formal derivation of the working equations. To understand how specific blocks ofĜ can be targeted, we must understand the significance of normal ordering in STEOM. We may writeĜ in the form where we have reverted to the spin-orbital notation. It is important that similarity transformation described here leaves the CC conditions unchanged, 47 that is, g 0 = E 0 , g a i = g ab ij = 0. Consider now the following expression and use the generalized Wick theorem to evaluate it, since in a product of normal ordered operators we need to consider only the contractions that connect different terms, all other contractions vanish. Thus, using normal ordered operators allows us to target individual matrix elements ofĜ. In the above case, if we can assume that j 6 ¼ i then the above expression reduces just to g j i apart from a minus sign. Of course, we do not The process of successive similarity transformation used in similarity transformed EOM (STEOM)-coupled cluster (CC). E HF is the RHF energy, while E 0 denotes the ground state CC energy wish to eliminate the diagonal element g i i , however, we can ensure that a certain subset of occupied orbitals considered important for some reason should become disconnected from the rest. This is achieved by dividing the set of occupied ({i}) and virtual ({a}) orbitals into active and inactive parts such that {i} = {i 0 }[{m} and {a} = {a 0 }[{e}, where the labels i 0 and a 0 denote the inactive while m and e the active orbitals in their corresponding orbital subspace. Coming now to our real target, the SD block, we note that it consists of the matrix elements g cj ab and g ij kb . Since we have introduced an active space, we cannot make all of these disappear, but we can enforce that the following conditions are satisfied, These serve to remove the effect of label permutation in two-body terms to enable selecting the appropriate matrix element rather than its antisymmetrized variant. The functions Φ are Slater determinants obtained as Φ b ji =â bâ jâi j 0i, remembering that the stringâ bâ jâi is already normal ordered. Note also that for one-body terms the distinction between active and inactive labels is strictly observed to avoid canceling the diagonal elements, while for the two body terms, it is possible to lift this restriction inΦ so that all its labels run over the full occupied or virtual set. The reason for keeping m and e for the ket will become apparent, once the last ingredient of the STEOM procedure is also considered, namely the definition of the operatorŜ. Inspecting the above equations reveals that the determinants produced by the normal ordered operator strings are ionized states. Indeed, Nooijen and Bartlett relate these equations to the IP/EA equations of the EOM formalism based on H with the ionization operators defined in Equation (13) and Equation (14).
TheŜ operator is defined asŜ =Ŝ + +Ŝ − withŜ Note the resemblance to the matrix elements we wish to eliminate. Finally, the relationship to the amplitudes of the EA-EOM (R + ) and IP-EOM (R − ) operators can be expressed as where λ labels the IP or EA states. The underlying assumption is that we compute one IP or EA state for each active label m or e which is dominated by the amplitude R λ m or R e λ that describes ionization from or attachment to that active label. Thus, R λ m and R e λ define a square matrix which is diagonally dominant or can be rearranged to be a diagonally dominant matrix. This also means that IP/EA roots dominated by double excitations are not accepted for the purposes of similarity transformation. Having completed the IP and EA calculations, these matrices are cut out of the full IP/EA singles and according to Nooijen and Bartlett, their inverse provides the necessary conversion factor to get theŜ amplitudes. The singles S e a 0 and S i0 m are then obtained from the remaining elements of the IP/EA amplitudes, although Nooijen and Bartlett show that they do not change the energy and hence eventually discard them from the similarity transformation. Since the conversion matrix is defined by an inverse, it needs to be a square matrix, and since we usually do not compute all singles roots, one active label must be kept in the resulting doubles S ej ab and S ij mb . Thus, not all elements in the SD block ofĜ are zero, although one may hope that the coupling between active and inactive blocks is small, especially if the difference in orbital energies is large. This allows for the automatization of the STEOM active space selection procedure by using state averaged CIS natural orbitals with large occupation numbers. 73,74 One caveat is that partitioning the orbital spaces into active and inactive parts leads to the loss of unitary invariance, but the impact of this is usually negligible or can be removed by computing more roots. [73][74][75] OnceŜ is obtained, the G SS block can diagonalized which is an operation that scales as O(N 4 ). However, the ground state step is O(N 6 ) scaling, the IP/EA steps are O(N 5 ), and the construction of G SS integrals scales as O(N act N 5 ), where N act is the number of active labels. Unlike the methods discussed in the previous subsection, STEOM is not an approximation to EOM-CCSD. STEOM is closely related to the FSCC method of Stolarczyk and Monkhorst 44 while both STEOM-CCSD and EOM-CCSD can be derived from the so-called dressed EOM equations of Meissner and Bartlett. 76,77 This contains an additional triples term not present in EOM-CCSD, but which is retained in STEOM and makes it charge transfer (CT) separable. This means that for two noninteracting closed shell subsystems the EE of an excitation from one subsystem to the other is equal to the sum of a corresponding IP value in the first subsystem and an EA value in the other. Denoting the result of the final STEOM diaginalization asĈ, this feature is most easily seen by substituting the final STEOM state, eT eŜ n oĈ j 0i, into the projected Schrödinger equation, and expanding the result in terms ofŜ. After identifying the EOM-CCSD-like terms, a furtherŜ 2 n oĈ term survives, which is the triples term in question. 48 It should also be mentioned that STEOM has been extended to include the doubles space in the diagonalization manifold. 78 Despite its advantages, STEOM needs further algorithmic improvements to be competitive with second-order methods.

| Integral decomposition techniques
As quantum chemistry involves the manipulation of high rank tensors, it is natural to think of tensor decomposition schemes as candidates that might bring significant improvement as far as computational cost is concerned. In the next section, we will discuss schemes that can be understood as decomposition of amplitudes, while in the present one, we will focus on integral decomposition schemes. One of these is the resolution of identity (RI) or density fitting scheme, 79-81 which involves expanding the product of two Gaussian functions (κ and λ) in the basis of an auxiliary Gaussian basis set {α}, Thus, a four-index atomic orbital (AO) integral will have the form The coefficients can then be determined by projecting the expansion in Equation (44) to the auxiliary basis (in the Coulomb metric, although other choices are possible), The coefficients can be expressed by inverting the two-index Coulomb matrix to yield α 0 α α and V αβ = (α|β). This then is a symmetric decomposition in the sense that a four-index quantity is decomposed into two three-index ones. The Cholesky decomposition (CD) 82,83 also belongs to this formal category. It must be added that generating the four-index integrals directly scales as O(N 4 ) while constructing them using RI/CD scales as O (N 5 ) and thus we must next investigate under what circumstances it is beneficial to use these techniques. For one-index transformations, we have in general in which p, q,… now stand for any molecular orbital (MO) or AO label, and t t p is some set of coefficients. Carrying out the one-index transformation without RI/CD scales as O(N 5 ), while transforming the three-index integral (pq|α 0 ) instead is an operation of O(N 4 ). Furthermore, the storage requirements are reduced from O(N 4 ) to O(N 3 ) when three-index quantities are stored instead of four-index ones. The saving due to the latter is particularly significant if transformation is from the AO to the MO basis, which is a common scenario in CC and other correlation methods. Thus, while constructing the eventual four-index quantities comes with an overhead, the savings due to storage and integral transformation costs still makes the RI/CD technique advantageous. However, the true advantage of these methods comes when two-index transformations are considered, such as the contraction with D α 0 = (α 0 |rs)P rs , and P rs may be referred to as the pseudodensity. In this case, the contraction on the left-hand side scales as O(N 4 ), while on the right-hand side, the construction of D α 0 and its contraction with the three-index integral both scale as O (N 3 ). Unfortunately, this is not always the case, where D qr α 0 = α 0 jqs ð ÞP rs . Here, the left-hand side scales again as O(N 4 ), but so do the two steps on the right. In EOM-CCSD, the most expensive external exchange term, T ij cd acjbd ð Þis of this type, and since the contraction is carried out for each ij pair, it scales as O(N 6 ). There may still be a significant advantage in applying these techniques due to lower storage requirements and, in the MO basis, owing to a smaller prefactor. As the EOM-CCSD and CC2 models contain both types of terms, the overall scaling is not reduced, but the efficiency compared to the canonical implementation is still significantly improved as shown by the investigations of Krylov and coworkers applying the RI and CD schemes to EOM-CCSD 84 and especially by the work and Hättig and Weigend on RI-CC2. 85 To address the shortcomings of RI in Equation (50), it is possible to evaluate the integrals in a seminumerical fashion, leading to the chain of spheres (COS) algorithm, 86,87 itself related to the earlier pseudospectral scheme of Friesner. 88 The integral decomposition in this case is in the AO basis (μ,ν,κ,λ,…) and takes the form where This formula corresponds to evaluating the integration over one set of electron coordinates in a two-electron integral numerically over a grid. Thus, r g is the position of the grid point and w g is the weight factor associated to that grid point. As it can be seen, the original four-index quantity is now decomposed into two two-index ones and one three-index quantity. Constructing the integral, this way thus scales as O(N 5 ) with a large prefactor due to the fact that the number of grid points is usually much larger than the number of basis functions (this is not so in the pseudospectral case, but then the grid points must be specifically chosen). Again, the real advantage comes when the integrals are contracted with other quantities. Revisiting Equation (50), we find where F g λ = X g κ P κ λ and G g ν = A νλ g F g λ . All these operations scale as O(N 3 ) with a large prefactor as opposed to the RI scheme, which scales as O(N 4 ) with a small prefactor. Thus, the RI scheme is favored for smaller systems and COS for larger ones. The COS algorithm was also explored for EOM-CCSD, 89 but only for the external exchange term, which can be successfully reduced to an O(N 5 ) scaling one. This amounts to reducing the prefactor of the overall method. Better results would follow from using RI for terms of the type in Equation (49) and COS for terms of the type Equation (50), but this has not yet been attempted so far.
Finally, the tensor hypercontraction scheme (THC) 90,91 uses the PARAFAC algorithm to decompose the four-index integrals into four two-index quantities and an intermediate "weight matrix" For the case in Equation (50), the THC algorithm behaves similarly to COS, leading to an improved scaling of O(N 3 ) and it has also been applied to excited state CC2 yielding the O(N 4 ) scaling THC-CC2 method. 92

| Localization techniques
The first local treatment of EOM-CCSD is by Korona and Werner from 2003, 93 who had adapted their local CC algorithm [94][95][96][97] for excited states. The LCC approach spans the occupied space using localized orbitals (LMOs), while the virtual space is spanned by the so-called projected atomic orbitals (PAOs) of Pulay 98,99 which are obtained from atomic orbitals using the following projector Here, C L is the LMO coefficient matrix, I is the unit matrix, and S is the AO overlap matrix. It is then possible to construct domains for each LMO i which will contain all important PAOs for that i. Since PAOs correspond uniquely to AOs, it is enough to identify the important AOs and because these are centered on atoms, this means the identification of a list of significant atoms. Once an atom is chosen, all PAOs associated with it are considered part of the domain. To achieve this, the Löwdin orbital charges are calculated for a given LMO and a given atom, and the atomic contributions are summed up until a given threshold is reached. For excited states, this scheme has to be modified since CT excitations significantly modify the domains. To avoid an iterative update of domains during the EOM calculation, a CIS calculation is carried out for the same state. A set of relevant LMOs is then chosen based on the diagonal elements of the occupied block of the CIS density matrix. Since these add up to one if all LMOs are summed over, the LMOs with the largest contributions are easily selected. Next, a new orbital ϕ * i is defined as where R a i is a selected CIS coefficient and ϕ a are virtual orbitals. The ground state domain construction process is then repeated for these orbitals, and the final domain of i is the union of the ground state domain and this new set of PAOs. The domain for the pair ij can then be obtained by the union of the domains of i and j. As for the ground states, some pairs are considered strongly interacting, others less so, and the latter are either discarded or an MP2 estimate is used based on various criteria. The authors conclude that the number of significant amplitudes is larger than in the ground state and the approach is not linear scaling, although improvements in efficiency are still significant. A local CC2 approach was also developed by Kats and was also combined with RI-CC2. 100 The later implementation introduced and adaptive adjustment of excitation domains and also made use of the Laplace transform to manipulate the energy denominator. 101,102 Perhaps the most popular localization technique these days is the one based on pair natural orbitals (PNOs). [103][104][105][106] The idea itself goes back to the seventies and it was revived again in 2009 by Neese and coworkers 107,108 and has since been taken up by the group of Werner [109][110][111] and a number of other groups whose activities will concern us later regarding excited states. The local pair natural orbital (LPNO) formulation again starts off with LMOs for the occupied space, while the virtual space is spanned by PNOs which are obtained from some estimate of the pair density (D ij ab ), here defined as The simplest choice for the amplitudes T ij ab is the MP1 ones, with K ij ab = iajjb ð Þand ε being the orbital energies. The PNOs are then obtained by diagonalizing the pair density, Here, the matrix d ij aã ij transforms the canonical virtual orbitals (a) into PNOs (ã ij ). The significance of this transformation lies in the fact that there are only a few PNOs for which the PNO occupation number n ij a ij is large. Thus, the LPNO algorithm first selects the important pairs ij based on an estimate of the pair contribution to the correlation energy and then for the surviving pairs, it constructs a list of PNOs based on predefined thresholds. This approach yields the most compact representation of the virtual space for a given occupied pair. The various quantities can then be transformed into the PNO basis and thus the scaling of the CC equations is reduced. It should be mentioned here that the possibility of associating to each LMO a list of orbital specific virtuals (OSVs) obtained, for example, from the singular value decomposition of MP2 amplitudes has also been investigated. 112,113 There are other choices to define the amplitudes and the densities as well. 107 However, as it stands, the construction of PNOs in the above way and the transformation of the various quantities into the PNO basis still constitute a bottleneck, and the transformation of integrals is only made feasible by the RI approach. Since the PNOs are still defined in terms of delocalized canonical virtual orbitals, the approach can be further improved by expanding the PNOs in terms of PAOs, giving rise to the domain-based LPNO or DLPNO algorithm. 114,115 This also allows for various prescreening mechanisms and the construction of the pair density (and hence the PNOs) using a local basis. A recent implementation using sparse maps has made the scheme genuinely linear scaling. 116,117 However, progress for excited states has been slower. The pioneering study in this regard is due to Helmich and Hättig, who pointed out that a large number of ground state PNOs are necessary for an accurate description of excited states and proposed a state-specific PNO approach instead. 118 The necessary pair densities were defined using Equation (57) with CIS(D) amplitudes, in which C i a and ω are the CIS eigenvectors and eigenvalues, respectively. This approach requires the computation of PNOs for each state multiple times during the iterations. Although this increases the cost, only a few PNOs survive for local excitations, which make the computation less costly for these than the ground state. For CT states, initial findings showed that the number of surviving PNOs can be significantly larger, but the problem was resolved in the later work of Helmich and Hättig using the Laplace transform. 119 The state-specific PNO approach 118 was developed by Helmich and Hättig for CC2, 119 ADC (2), 119 and more recently for EOM-CCSD by Frank and Hättig, 120 resulting in an O(N 3 ) implementation. Another approach was recently proposed by Valeev and coworkers relying on a state averaged PNO approach for excited states. 121 For IP computations, the frozen natural orbital (FNO) approach, which relies on the diagonalization of the one-body density, has been implemented with success. 122 State-specific natural orbitals are also used by Kállay and coworkers who have recently published their CC2 123 and ADC(2) 124 implementations. In their work, they also make use of a related natural auxiliary function approach which reduces the size of the auxiliary basis set needed in the RI schemes. 125 Other groups have experimented with natural transition orbitals obtained from decomposing the CIS state vector, [126][127][128][129][130] or simply neglecting some of the canonical virtuals in the so-called reduced virtual space (RVS) approach. 131 Last, our group followed an approach in which the PNO quantities are transformed back after the ground state DLPNO calculation and are forwarded to the EOM or STEOM code (bt-PNO). 74,132 Reducing the scaling of the ground state step, still leaves us with an O(N 6 ) scaling bt-PNO-EE-EOM code, but the overall scaling of the bt-PNO-IP/EA-EOM code is now reduced to O(N 5 ), which allows for their use with much larger systems. As a consequence, bt-PNO-STEOM-CCSD will also scale as O(N 5 ), which makes EE calculations more affordable for dozens of roots as well. Using ground state PNOs in the IP/EA/STEOM cases can be pushed a step further. A DLPNO implementation of the IP-EOM 133 is straightforward and requires only a few ground state intermediates in the ground state PNO basis to reach near linear scaling. The EA-EOM 134 equations are more problematic because of the truncation of the leading term, as already noted by Krylov and coworkers. 122 Since the leading Fock term cannot be associated to any occupied orbital, it has to be kept in the canonical basis, while for terms with a single occupied label, it is advantageous to introduce a new set of ground state PNOs. There are called "singles PNOs" and belong to diagonal pairs, but they are generated using a much tighter threshold than the usual ground state cutoff. Using the various orbital spaces allows for an efficient and accurate DLPNO-EA implementation. The significant savings in the IP/EA calculations also reduce the cost of the STEOM computation to O(N 4 ), apart from the construction of STEOM integrals. Once the scaling of the latter step is also reduced, the resulting DLPNO-STEOM method will be genuinely O(N 4 ) scaling, which is the same scaling as that of the CIS/TDDFT diagonalization step.

| Embedding techniques
Another way to overcome the prohibitive scaling of wavefunction-based methods is to partition the system into a chemically important part and its environment and treat the latter in some more cost-efficient way. Defining more than one subsystem also allows for the identification of interaction terms among them. This field is in itself an area of active research and we will only discuss the most important recent developments in the special case when the central system is treated using some excited state CC method. The interested reader is referred to the comprehensive review of Goez and Neugebauer for further details. 135 When the central fragment and the environment are both treated at the quantum mechanical (QM) level, we speak of QM/QM methods. Among these, the most popular method is frozen density embedding (FDE). 136 The starting point of FDE is the fact that the subsystem densities are additive and in Kohn-Sham theory, it is possible to express these in terms of one electron orbitals. Minimizing the resulting energy functional with respect to the orbitals of a given fragment leads to the usual Kohn-Sham equations for that fragment in which the Hamiltonian contains an additional term called the embedding potential that describes the effect of the other fragments on the central one. In FDE, the Kohn-Sham problem of the central fragment is solved while the densities of all other fragments are kept frozen. While FDE lends itself naturally to density functional theoretical methods, it has also been extended to wavefunction methods. For excited states, the embedding potentials may be generated in a state-specific manner as done by Daday et al., 137 or using a state universal embedding and formulate the statespecific response of the environment using response theory. The latter approach was taken up by Höfener, Visscher, and coworkers 138 in developing a CC-in-CC 139 and a CC-in-DFT 140 FDE scheme. The same authors also extended the response formulation of Neugebauer 141,142 to RICC2 allowing the treatment of excitations coupling different subsystems. 143 Some years ago, an FDE version of the ADC(2) method was also proposed. 144 The groups of Neugebauer and Filippi have recently analyzed practical aspects of WFT-in-DFT calculations for excited states using their state-specific embedding method. 145 The projector-based embedding of Manby et al. 146 has also been extended recently to embedded EOM-CCSD calculations. 147 This approach generates localized orbitals for each fragment which makes it easy to define a projector to make the orbitals of the subsystems mutually orthonormal and thus guarantee that the nonadditive kinetic energy terms in the embedding potential should vanish. Some fragmentation approaches 148,149 have also been extended to excited states. The generalized energy-based fragmentation (GEBF) 150 breaks large systems into fragments, capping the dangling bonds with H atoms, and computes the total energy from subsystem energies. GEBF has been recently extended to local excitations in combination with CC2. 151 Hirata's fast electron correlation 152 method is based on the multipole expansion and has been successfully applied to excited states of weakly interacting clusters. It is also possible to define multilayer QM-based approaches using local techniques 153 like the DLPNO approach in which the difference between fragments lies in the cutoff thresholds of the local approximation. 154 This makes the interaction between fragments easy to describe. The method has also been extended to the computation of IP within the DLPNO-EOM-IP framework. 155 Another popular approach is to combine the QM approach with molecular mechanics (MM). The QM/MM methods are either subtractive or additive. The former requires an MM calculation for the whole system to which the difference between a QM and an MM calculation on the central fragment is added. An example of such a scheme is Morokuma's "our own n-layered integrated molecular orbital and molecular mechanics" (ONIOM) approach, 156 which has also been applied with success in combination with EOM-CCSD calculations. 157 Additive schemes treat the central fragment at the quantum level and use MM to describe the environment, while the interaction is introduced through a coupling term. Based on the treatment of the interaction, QM/MM schemes can be divided into three classes: 158 mechanical, electrostatic, and polarizable embedding. While this classification is in principle also applicable to subtractive schemes, we will focus on additive schemes in the following. In mechanical embedding, a point charge representation of the QM region is created for the MM calculation, while in electrostatic embedding, the MM point charges are represented via an additional term in the QM Hamiltonian. For mechanical embedding, this would mean in the excited state context that different point charge representations of the QM region would have to be derived for different states. Partly for this reason, electrostatic embedding is the most popular approach, as it delivers reliable results without exhibiting a complexity similar to polarizable embedding approaches. However, most recent excited state QM/MM approaches belong to the latter category, which also accounts for the polarization of the MM atoms in the presence of the QM density. The effective fragment potential (EFP) 159,160 method was originally designed to describe solvation in water and treated the solvent molecules as environmental fragments described by effective potentials. The Hamiltonian was updated with a number of interaction terms with the environment and the same interactions were also accounted for in the MM region. These terms are the multipole terms representing the Coulomb interaction, the induced dipole terms to describe polarization and a repulsion term. The latter was split into an exchange and a CT contribution in later variants in which a dispersion term was also added. 161 The work of Krylov, Slipchenko, and Ghosh on combining EFP with EOM-CCSD [162][163][164][165] as well as recent work on EFP-ADC 166 must be mentioned here. The polarizable embedding (not the same as the QM/MM type above) model of Olsen et al. 167 is a similar method but it treats the environmental response in a fully self-consistent manner. Since it relies on response theory, it is easily applied to excited state problems. The method has a CC variant, 168 including PERI-CC2 169 and PE-ADC(2), 170,171 and also a QM/QM/MM variant called polarizable density embedding (PDE) in which a third inner region is inserted between the core region and the MM environment. 172,173 Finally, it is also possible to regard the environment as a continuum surrounding the quantum region. The first variational formulation of a CC model involving a continuum goes back to the work of Christiansen and Mikkelsen 174 and was later extended to the polarizable continuum model (PCM). 175 PCM-EOM was proposed by Cammi 176 and in a slightly different formulation by Caricato. 177 The latter also reported a mixed implicit/explicit EOM solvent model 178 as well as benchmark results on PCM-EOM. 179 Caricato's recent review is also a useful guide to the field. 180 It should be noted that when PCM is added, the EOM and linear response treatments yield different results. 181,182 In particular, the latter depends on the transition density of the solute and can be recommended for the description of transition properties, while in the former approach the solvent responds to the excited state density of the solute and thus it is also called state specific since for each state it provides a better description of the mutual polarization between solute and solvent. 180 A different implicit solvent model, the conductor-like screening model (COSMO) 183 was also implemented for the ADC(2) method recently. 184,185 While implicit solvent models are much cheaper than explicit ones, they fail to capture specific interactions with the environment, such as hydrogen bonds between solute and solvent. For the latter type of systems, explicit or mixed implicit/explicit models are therefore recommended.
The nature of the condensed phase is another factor to be considered when selecting a method. Most of the above techniques have been used to describe liquids, and solvent-solute interactions in particular. Regarding excited state properties of solids, embedding approaches are most suited to systems with a substantial band gap. For such systems, STEOM-CCSD has been shown to provide reliable predictions for band gap energies. 186 For metallic systems, periodic approaches are the method of choice and here the pioneering work using the EOM-CCSD framework is the Gaussian-based periodic implementation of Chan and coworkers. 187,188 The two approaches can be contrasted for the computation of band gaps of semicondustors. The embedding approach allows for the direct application of methods developed for molecules such as STEOM to solids. This means that localization techniques, in this case, bt-PNOs, can be exploited without additional effort. In contrast, the work of Chan relies on canonical orbitals and translational symmetry. Thus, it is cheaper to converge the calculations with respect to cluster size and basis set size using the embedded bt-PNO-STEOM method. As a consequence, bt-PNO-STEOM can be used to predict the bandgaps of semicondutors to an accuracy that was not possible before. Lange and Berkelbach have also recently published an interesting discussion on the relationship between EOM-CCSD and Green's function-based approaches, with potential consequences to applications in the solid state. 189 Furthermore, Booth, Grüneis, Kresse, and Alavi have demonstrated that the standard hierarchy of quantum chemical methods and in particular the CC ansatz is also valid for solids. 190 For a review on the uses of CC methods in material science, see the review of Zhang and Grüneis, 191 for a discussion on local correlation approaches for solids, see the work of Usvyat, Maschio, and Schütz. 192 5 | BENCHMARKS

| Accuracy
Until now, we have been studying the formal properties of various methods with the expectation that the methodological hierarchies will be reflected in the performance of these methods. However, performance is an empirical matter which may be influenced by a number of factors other than formal properties. These include various forms of error cancellation, the choice of the benchmark set, the basis set, and the reference method. Thus, our judgment of performance may change as better and better benchmark sets become available, whereas formal relationships remain the same. For example, it may be that based on the best of our current benchmark results, CC2 performs better than CCSD under some specific circumstances. The same benchmark data also show that CC2 does much worse for different types of problems, and we may say in this case that the formal superiority of CCSD is reflected in its more balanced treatment of a wider range of excitation types. Nevertheless, it is of practical value to know when CC2 performs especially well. This example serves to illustrate that formal and empirical considerations must be both kept in mind when interpreting benchmark data. It also highlights the fact that benchmark data need to be carefully analyzed. In particular, when we talk about "specific circumstances," we mean that the benchmark data are subdivided into various physically meaningful subclasses (e.g., singlet or triplet excitations, CT excitations, vide infra). This prevents certain features to be lost by averaging over a large number of physically dissimilar results. It is also worth keeping in mind that such categories are not always known in advance, and thus, the analysis of the raw data may lead one to introducing novel ones. The benchmark studies cited here all contain more detail than allowed by the scope of the present review. Here, Gaussian curves and the associated mean error and standard deviation values will illustrate some of the major findings of benchmark studies, but it has to be noted that there is no reason to expect the error distribution of quantum chemical methods to follow a Gaussian curve. We have also provided mean and maximum absolute error values wherever available. The latter may serve as an indication how bad the method can be, and thus provides information about the tail region of the error distribution. Taken together, these parameters provide a good impression of the accuracy of the methods discussed here, while references to the major benchmark studies will hopefully guide the more interested readers to explore them in more detail.
In this section, we will first summarize the major benchmark studies for the methods shown in Figure 1. We will attempt to draw some conclusions afterward about their relative accuracy before moving on to summarize STEOM benchmarks and compare them to the methods in Figure 1. The effect of the various algorithmic approximations on the accuracy of EEs will be discussed last. It will be useful to distinguish results for valence states from those for Rydberg states, the latter being states in which the electronic distribution is very diffuse. Within valence states, it would also be useful to differentiate between local excitations and CT states. In the latter case, the transition occurs between states in which the electron density is localized on different parts of the molecule, as opposed to local excitations. However, no benchmark set has so far been explicitly designed for CT states, and thus we will have to be content with a few illustrative calculations. We will also mostly focus on singlet states dominated by a single excitation as a major target for single reference methods. The accuracy for triplet states is usually much better than for singlets, and we will only mention cases when this is not so.
In their thorough benchmark paper from 2008, Thiel and coworkers 193 not only proposed a test set that has now become standard for investigating vertical EEs, but they also provide a useful summary of available results comparing some of the methods discussed here to full configuration interaction (FCI) calculations on small molecules. In particular for EOM-CCSD, singles dominated singlet excitations exhibit a mean (maximum) error of 0.12(0.23) eV while for singles dominated triplets the same value is 0.12(0.26) eV compared to FCI. We may thus regard a 0.1-0.2 eV or larger deviation as significant when comparing various results. Thiel and coworkers also provided theoretical best estimates (TBE) for vertical EEs based on a survey of the literature and their own results. The latter was based on CASPT2 results for singlet and CC3 results for triplet states computed using MP2/6-31G* geometries, first using the TZVP basis and subsequently publishing results with the aug-cc-pVTZ basis, partly using extrapolation techniques. In their earlier work, 11,193 Thiel and coworkers used the TZVP basis and reported statistics not only on the whole test set but separately for singles dominated states (above 90% singles contribution) and n ! π* states, for which the methods examined are expected to perform better. On the other hand, their later work 194,195 also took the effect of diffuse functions into account using the aug-cc-pVTZ basis set. The relevant methods discussed in these papers are CCSD and CC2, while ADC(2) data are supplied by Dreuw and coworkers for Thiel's test set. 196 The work of Goings et al. 197 also provides a comparison between some of the second-order methods discussed here. Kánnár and Szalay revisited Thiel's work, adding some missing CC3 results and CCSDT reference data for the small molecules using the TZVP basis. 198,199 In two distinct studies, the group of Szalay also addressed the effect of diffuse functions separating valence and Rydberg states in their discussion, since the proper description of the latter is not possible without the presence of diffuse functions. In one of these studies, results obtained for the entire Thiel test set with various methods using the aug-cc-pVDZ basis are compared to CC3, 53 while in the other, the CCSDT reference was used on a subset of small molecules using a number of basis sets. 200 Considerations on the importance of diffuse functions have recently led Jacquemin and coworkers 201 to define new TBE based on augmented basis sets. Thus, they use geometries obtained at the CC3/aug-cc-pVTZ level and define the reference energy using an extrapolated FCI procedure. Using a test set of 18 small molecules, the CC3, CCSDT, and CCSDTQ values reported in this study with the aug-cc-pVTZ basis set show a negligible deviation with respect to their TBE.
Finding a common ground to compare all the methods discussed so far is not easy and there are several concerns to be kept in mind. First, an ideal comparison would rely on comparing all methods to the same reference method using the same basis, preferably at the complete basis set limit. Second, the basis set should contain diffuse functions to ensure the correct distinction of Rydberg states, and separate statistics on valence and Rydberg states should be available. Third, the test set should contain large enough molecules, as the effect of size has been noted several times in the discussion of the differences between various benchmark results. Based on these criteria, we will use the results of Tajti and Szalay 53 since their calculations are carried out on the singles dominated excitations of the entire Thiel test set comparing all methods to CC3 using the aug-cc-pVDZ basis. Furthermore, they analyze five out of eight methods in Figure 1 using an augmented basis distinguishing between valence and Rydberg states. However, before discussing their results, a few general observations must be made. Regarding the effect of diffuse functions, the Thiel group observed that TZVP and aug-cc-pVTZ results for the various methods may differ with as much as 0.2 eV on average. 194,195 The aug-cc-pVDZ basis does not provide fully converged results either, although the effect is not equally large for all methods. EOM-CCSD(2) seems particularly sensitive based on the available data which suggest a deviation of 0.1 eV between aug-cc-pVDZ and aug-cc-pVTZ results. 200 The reason we still chose the aug-cc-pVDZ data for comparing methods is because they are also available for large molecules, the primary interest of this review. Indeed, it has been noticed in the literature that there is a distinctive size effect for the various methods. To estimate this effect, we may compare the results of the Szalay group obtained using the aug-cc-pVDZ basis for a subset of small molecules in Thiel's test set on one hand 200 and the entire set on the other. 53 This comparison reveals that including the larger molecules leads to a shift in the average error of about +0.1 eV for CCSD and −0.1 eV for CC2, for both valence and Rydberg states. For EOM-CCSD(2), the shift is −0.3 eV for valence states, and only half as large for Rydberg states. This has to be kept in mind, since as a result the relative accuracy of methods can be different for small and for large molecules, as also observed by Jacquemin and coworkers. 201 Table 1 and Figure 3 summarize the results of Tajti and Szalay, to which we added some indicative results for CT states. The five methods included are (EOM-)CCSD, (EOM-)CCSD(2), P-(EOM-)CCSD(2), CC2, and CIS(D ∞ ). The missing ADC (2) method is expected to be close to CC2, while CIS(D) is an approximation to CIS(D ∞ ), and hence it is expected to perform somewhat worse. Finally, P-EOM-CCSD is missing, but its CCSD(2) variant is featured among the results. Based on the data in Table 1, the P-EOM-CCSD(2) method shows the worst performance for valence states, although it performs much better for Rydberg states, in accordance with the findings of Goings et al. 197 The remaining methods can be assigned to two groups based on the similarity of the results, EOM-CCSD and EOM-CCSD(2) on one hand and CC2 and CIS(D ∞ ) on the other. EOM-CCSD(2) and CIS(D ∞ ) are obtained from EOM-CCSD and CC2 by replacing the ground state solution with MP2 amplitudes (and hence they also lack the singles transformation in the excited state). Thus, using MP2 ground state amplitudes has a relatively mild effect on the results, although this may well change in cases where MP2 breaks down. Furthermore, what distinguishes the two groups is that EOM-CCSD and EOM-CCSD(2) both retain the full DD block of the Jacobian (except again for singles in EOM-CCSD(2)) while CC2 and CIS(D ∞ ) approximate it with the diagonal Fock term. The first group treats valence excitations much better than Rydberg states, although EOM-CCSD can be said to offer a balanced treatment of both. The members of the second group outperform the first for valence states but perform much worse for Rydberg states. For CT states, EOM-CCSD exhibits significant errors while EOM-CCSD(2) offers a mild improvement on those. However, the CT errors of these methods are reasonably close to the errors they make for valence and Rydberg states and are much  (2), and P-CCSD(2) stand for equation of motion (EOM)-CCSD(2) and P-EOM-CCSD (2). The charge transfer results were obtained in our group using CCSDR(3)/aug-cc-pVDZ as a reference. 74 smaller than CC2 CT errors. Thus, we may conclude that EOM-CCSD has a more balanced performance than CC2 since CC2 outperforms CCSD for valence states in large molecules but falls short of it for small molecules and for Rydberg and CT states in general. Much fewer benchmarks are available for STEOM, although Nooijen and coworkers have carried out test calculations using the Thiel test set using the TZVP basis. 75 Compared to CC3, singles dominated singlet roots, here defined as those with a singles contribution larger than 87%, exhibit a mean error of −0.02 ± 0.08 eV as opposed to EOM-CCSD with 0.20 ± 0.09 eV. Thus, STEOM outperforms EOM for singlet states. Comparing the STEOM results for singlets and triplets, STEOM turns out to be worse for triplets, as opposed to any other method for which there are data available. Still, the error is only −0.11 ± 0.17 eV. Based on the data of Jacquemin and coworkers shown in Table 2, EOM and STEOM show comparable performance for the small molecules involved in the test set. 201 Jacquemin's data also allows for a comparison with other methods and indicates that STEOM and EOM perform similarly and both outperform CC2 for valence and Rydberg states. Considering larger molecules than those in Jacquemin's may change this picture as indicated by Nooijen's data on the entire Thiel test set. The results discussed so far indicate that CC2 and STEOM perform better than EOM for larger molecules, although in the case of CC2, this is only valid for valence excitations, not for Rydberg states. The results available for STEOM suggest that it performs about as well for Rydberg states as EOM does. The scant data available on CT states clearly favor STEOM over EOM and both methods over CC2.
The question of accuracy comes up in a slightly different context for algorithmic approximation techniques. These introduce other parameters into the theoretical problem, be it the size and quality of an auxiliary basis set, a molecular grid, an orbital domain, or PNO truncation threshold. These parameters can then be chosen such that the error with respect to the parent method is below some desired limit, usually at most 0.1 eV, preferably less. The usefulness of an approximation then depends on the amount of cost reduction it brings for a given accuracy. This leads to the introduction of standards that guarantee a certain accuracy. For RI and DF methods, this means the design of appropriate auxiliary basis sets that guarantee a very good accuracy for excited state methods. For RI-EOM, the RI errors is reported to F I G U R E 3 Error distributions for the various second-order methods for valence and Rydberg states based on the data of Tajti and Szalay. 53 CC singles doubles, CCSD(2), and P-CCSD(2) stand for equation of motion (EOM)-CCSD(2) and P-EOM-CCSD (2). The Gaussian curves are illustrative, since error distributions of quantum chemical methods are not guaranteed to be Gaussian T A B L E 2 Comparison of CC singles doubles (CCSD), CC2, and STEOM based on the data of Jacquemin, 201 using their own test set with the aug-cc-pVTZ basis and with respect to their theoretical best estimates. be about 1 meV, 84 while RI-CC2 is shown to reproduce the CC2 EE within a MAE of 1.3 meV even when augmented basis sets are used. 202,203 For COSX, the error depends on the grid and it is somewhat larger using standardized setups, but it is still lower than 0.01 eV on average. 89 TCH-CC2 is also known to yield errors lower than 0.02 eV. 92 The local EOM-CCSD of Korona and Werner also yields errors below 0.06 eV with appropriate standards. 93 Since state-specific domains are constructed, this is also valid for Rydberg and CT states. Similar conclusions hold for the state-specific PNO implementation [118][119][120]204 as well as for the state averaged PNO EOM implementation of Valeev. 121 Our own STEOM implementation depends on ground state PNOs, for which standardized parameter sets guaranteeing a certain accuracy are available, 205 and it has been shown that the error in the ground state computation and in the subsequent IP/EA calculations can also be kept below similar limits. 74,[132][133][134]206 The question of accuracy is most complicated for multilayer embedding methods. Here, the size of the various layers and whether the excitation crosses the layer boundaries are the factors that determine accuracy. Hence, these studies begin with an initial investigation and/or calibration of the parameters involved to ensure that the method is reliable. The more thorough treatment of this question is outside the scope of the present review.

| Efficiency
It remains to demonstrate recent advances in the field by surveying some of the breakthroughs in terms of efficiency. Beyond giving an idea on the system sizes available for treatment with the various methods and on the corresponding cost, a few representative examples will also indicate the type of chemical problems where a CC treatment may be beneficial. Some of these examples will illustrate the fact that the goal of developing more efficient methods is not simply to enable chemists to carry out larger single point calculations, but also to make a qualitative difference in their models, for example, by accounting for environmental effects explicitly. Canonical EOM-CCSD implementations usually cannot handle systems containing more than a few tens of atoms using common computer architectures although the use of RI and COS can make the treatment of 20-30 atoms more comfortable. 84,89 On architectures with less than 16 cores, it seems that the limit of canonical calculations is about 700 basis functions, although the use of RI and NTOs can make systems with around 1,000 basis functions tractable by lowering the number of active orbitals. 84,127 In our own experience, removing the costly ground state calculation either by replacing it with MP2 or using back-transformed PNOs have made it possible to compute two roots of the 11-cis-retinal protonated Schiff base using about 700 basis functions on four cores in about 9 days. 132 The largest linear alkane chain computed on 16 cores using the O (N 3 ) scaling PNO-CCSD implementation of Frank and Hättig contained 48 carbon atoms and nearly 1,800 basis functions. 120 Recent work using the projector-based embedding model also allowed for computing accurate EEs for the acrolein molecule. 147 While EOM-CCSD was employed on up to 19 orbitals, the number of basis functions treated at least at the HF level was also close to 1,800.
Much larger systems are available for treatment using second-order methods, especially CC2 and ADC (2). Already the first serial RI-CC2 implementation was capable of handling an alkane chain of 1,100 basis functions. 85 However, resent research has focused more on local techniques, which may allow for calculations on systems with more than 100 atoms with several thousand basis functions. A local CC2 approach using the Laplace transformation was used to compute an EE of the pyrene-phenothiazine-isoalloxazine triad with more than 1,300 basis functions taking a week on six CPUs. 101 A Laplace transformed variant of scaled opposite spin ADC (2), which also makes use of the reduced virtual approach made it possible to compute EEs of photosynthetic chlorophyll clusters which may be as big as 200 atoms. 208 A more proper treatment of the virtual space is achieved in the PNO-CC2 implementation of Helmich and Hättig. 119 Figure 4 illustrates a local excitation on the PNP − anion surrounded by water molecules. A calculation requesting six states for this system containing 100 water molecules and altogether 2,560 basis functions takes about 13 days on 12 CPUs. This result corresponds to the O (N 3 ) implementation discussed in the thesis of Helmich-Paris. 207 CT states are somewhat more expensive than local excitations but they should not be more expensive than a ground state computation. Kállay and coworkers have also applied their reduced cost approach to CC2 and ADC (2) and succeeded in performing calculations on molecules up to a 100 atoms and containing around 3,500 basis functions. 123, 124 Höfener has reported results using FDE in combination with RI-CC2 which facilitates the treatment of solvated systems containing up to 300 water molecules. 209 Finally, Figure 6a demonstrates the concept of the PDE QM/QM/MM embedding method. Although the figure chosen here for its conceptual clarity corresponds to a DFT-based calculation, similar computations are possible using the CC2, CCSD, and even CCSDR(3) methods. Such a treatment allowed the authors to describe the absorption spectra of the PNP − anion surrounded by 400 water molecules. 172,173 F I G U R E 4 A local π ! π* excitation of the PNP − anion surrounded by water molecules using pair natural orbital (PNO)-CC2. Before turning to state-of-the-art results with the STEOM-CCSD method, we briefly mention those obtained with the IP/EA-EOM method, since the efficiency of STEOM depends crucially on the performance of these. Figure 5a demonstrates the largest IP-EOM-CCSD calculation obtained with the near-linear scaling DLPNO-IP method. 133 Calculating two IP roots of the DNA decamer depicted here using more than 10,000 basis functions takes somewhat more than 9 days on a single core. The IP method was also used successfully by the Krylov group in combination with the EFP approach to describe the explicit solvation of thymine, 210 an example also used for the testing of the multilayer DLPNO-IP approach. 155 The EA implementation is much less efficient 134 due partly to the fact that the leading singles terms must be kept in the canonical basis to avoid truncation issues. 122,134 However, leaving some terms in the canonical basis and using multiple PNO cutoffs, it is possible to achieve a near-quadratic scaling implementation. The Pheophytin A molecule involved in photosynthesis as a secondary electron carrier contains more than 100 atoms, see Figure 5b, and the computation of the first EA value with a basis set of more than 4,600 basis function takes about 4 days on four cores using normal DLPNO cutoffs. In a DLPNO-STEOM calculation, this is one of the rate-limiting steps. Since building the STEOM integrals is at the moment carried out in the canonical basis, the DLPNO-STEOM process is less efficient. Computing five roots of the BODIPY dye depicted in Figure 5c with tight DLPNO settings takes 18 days on 4 cores using more than 1,900 basis functions. This makes the method already suitable for productive calculations, as our recent work on fluorescent organic dyes shows. 211 While these are already good results, once the dressing step is also implemented in the PNO basis, the performance of DLPNO-STEOM is expected to approach that of DLPNO-EA and the method will become genuinely O(N 4 ) scaling. However, already the bt-PNO variant of STEOM 132 can be usefully employed in the computation of band gap energies in the embedding approach illustrated in Figure 6b, using as many as 1,700 basis functions. 186

| CONCLUSIONS
We have surveyed state-of-the-art CC methods for the computation of EEs of large molecules. After introducing second-order and similarity transformed techniques, we have discussed algorithmic approximations that can increase the efficiency of the methods. It transpired that, among the popular second-order methods, CC2 and ADC(2) show a similar performance. CC2 in particular yields better results than EOM-CCSD for local excitations of large molecules based on the currently available benchmark data, although for Rydberg and especially for CT states, CC2 is considerably worse than EOM-CCSD, which overall yields more balanced results. The strength of the STEOM-CCSD method is that it handles all types of states equally well, including CT states, which are problematic for EOM-CCSD and especially for CC2. Thus, for single point energy calculations of closed shell molecules, STEOM outperforms CC2 and this may open up a whole range of potential applications starting from photochemistry to material design. Both CC2 and STEOM-CCSD can be implemented in a highly efficient manner, especially relying on various forms of the PNO treatment. CC2 and other approaches based on response theory have an advantage as far as properties are concerned, although it is also possible to compute many spectroscopic properties with STEOM. Future research will likely concentrate on extending the latter aspects of STEOM.
Regarding various algorithmic improvements, significant progress has been made in recent years to develop efficient local CC approaches for excited states, although progress is slower than for ground states. We expect to see further developments in this regard, in particular, the development of various approaches making use of state-specific, state averaged and ground state PNOs. Integral decomposition schemes also remain an important tool to accelerate CC calculations, and not merely in combination with localization schemes. It is possible to implement O(N 4 ) scaling variants of second-order methods using such techniques, but for the moment THC-CC2 remains unique in this respect. Finally, embedding schemes make it possible to describe the effects of the environment accurately and they may be combined with any of the other methods discussed in this review. It is therefore safe to conclude that much development effort will be directed toward the development of efficient embedding schemes in the foreseeable future.

ACKNOWLEDGMENTS
The author thanks the Max Planck Society for financial support. The following persons have provided useful feedback on an early version of the manuscript: Lucas Lang, Bernardo de Souza, Benjamin Helmich-Paris, Marvin Lechner, Dimitrios Pantazis, Ragnar Björnsson, Giovanni Bistoni, Alexander Auer, and Péter G. Szalay. I thank all of them for their contributions. I also thank Benjamin Helmich-Paris, Romain Berraud-Pache, and Dimitrios Manganas for providing me with some of the figures.