Coupled-Cluster (CC) theory provides the most accurate and reliable quantum chemical description of molecules at present. Its good performance can be attributed to the following facts:
exponential parameterization of the wave function,
hierarchical construction of the possible approximations allowing systematic improvements towards the exact solution.
CC methods have been reviewed in several articles, see, for example, Refs.[1-3]. Their excellent performance for electronic ground state is unquestionable. In particular, CC with singles, doubles and approximate triples (CCSD(T))[4, 5] has become the “golden standard” of quantum chemistry.
Excited states can also be treated within CC theory using the idea first presented by Monkhorst, and now known as equation of motion coupled cluster (EOM-CC) method[7-9] or coupled cluster linear response (CC-LR) method. These methods provide accurate results that can hardly be challenged by other techniques[3, 11, 12] and can also be used as benchmarks.
Still, the widespread application of CC methods is limited as they are rather expensive, and therefore, these are often considered as a tool for small molecules. At the same time, the interest among chemists is turning towards biological systems, where a molecule-level description is often needed to understand basic processes. One of these areas is the investigation of DNA, the carrier of the genetic information. Since the classic work by Watson and Crick, we know the dominant interactions which determine the structure of these macromolecules: (i) base pairs are connected by hydrogen bonds, (ii) the attached sugar and phosphate molecules form the backbone and keep the hydrophobic bases out of water, (iii) van-der-Waals type interactions between the neighboring sheets determine the parameters of the helix structure. Beyond these, further properties of DNA can also be attributed to its building blocks: all the nucleobases (cytosine, guanine, adenine, and thymine) are chromophores, easily absorbing low-energy photons making the DNA sensitive to UV radiation.
The question of what happens with DNA when irradiated by UV light is more than an academic question: is mankind well protected against harmful radiation? It seems that it is. Spectroscopic and theoretical investigations of the last 10 years seem to converge to the conclusion that there is a multiple level protection mechanism: (i) nucleobases have fast ways to return to the ground state (see, e.g., Barbatti et al.); (ii) base pairing largely influences photostability (see, e.g., Miannay et al.); and (iii) stacking interactions also have important effects on the excited state dynamics (see, e.g., Crespo-Hernandez et al.).
New applications are also emerging from the special electronic properties of DNA. For example, DNA nanostructures have been constructed and synthesized for use in molecular computers and microelectronics based on the electric conductivity of DNA is being investigated (for review, see Endres et al.).
This latter article shows, however, that the mechanism of conductivity of DNA is still not completely understood and in addition to huge experimental effort, theoretical investigations are also necessary to shed light on all details. Obviously, large DNA oligomers are just too big even for approximate quantum chemical methods. However, the building blocks of DNA show already some of the basic structural properties of the polymer and investigation of appropriate model systems, like small units of dimers or pairs of dimers of nucleobases, could give basic information on the electronic properties. In the present perspective article, we show some encouraging results and speculate on how these efforts could be extended to a better understanding of the electric properties of DNA.
Recent methodological developments of CC theory of excited states
The basic theory of CC treatment of excited states has been reviewed recently.[11, 12] Within the framework of EOM-CC (CC-LR) methods, a hierarchy of approximations can be defined by proper selection of the excitation manyfold: the starting set includes the single and double excitations (EOM-CCSD[8, 9]) which can be extended by triples (EOM-CCSDT), eventually even by higher excitations. The associated computational cost of the latter makes it applicable only for small molecules.
In a recent article, Watson et al. benchmarked EOM-CC methods with approximate inclusion of the triple excitations. They emphasize that, like in the case of ground states, there are two important issues to consider about computational costs: (i) are the triple excitation parameters (amplitudes) calculated iteratively or noniteratively and (ii) is it necessary to store the amplitudes or can they be directly included in the formulae? In the case of excitation energy calculations, an additional aspect should be added: is the EOM matrix defined in the reduced (SD) space or should the triple excitation space also be included? The most cost effective solution is if (i) the amplitudes are not stored, (ii) they are calculated noniteratively, and (iii) the EOM matrix is defined in the SD space only. Watson et al. discuss several versions: iterative methods—which are almost as accurate as the full CCSDT version—are the EOM-CCSDT-3 of Watts and Bartlett, and the CC3-LR by Christiansen et al. Storage of T amplitudes can be avoided in these methods, but the matrix is defined in the whole SDT space, so these still suffer from the matrix diagonalization of rank ∼n6. Therefore, noniterative versions have also been put forward: EOM-CCSD(T) and EOM-CCSD() by Watts and Bartlett, and CCSDR(3)-LR by Christiansen et al. Of these, the former two add a single evaluation of triples contribution to the energy, analogous to the so-called ΛCCSD(T)[27, 28] in case of the ground state. By contrast, CCSDR(3)-LR diagonalizes a modified CCSD matrix. Therefore, all three methods require an iterative diagonalization of the same size as that of EOM-CCSD, that is, ∼n4. Hence, they are vastly faster than the iterative triples models. One should note that similar corrections have been introduced by Kowalski and Piecuch in the so-called completely renormalized framework (CR-EOM-CCSD(T)). However, as pointed out, for example, by Watson et al., these methods are either not fully size-extensive or not invariant to orbital rotations.
Watson et al. not only analyzed the theoretical construction of these methods but also conducted a detailed analysis of their performance. EOM-CCSDT-3 and EOM-CCSD(T) calculations have been compared on the test set of Schreiber et al. consisting of 121 excited states of 24 molecules. The comparison has been extended by the results of a similar study by Sauer et al., where CC3-LR and CCSDR(3)-LR were included. The results show that the noniterative CCSDR(3)-LR and EOM-CCSD(T) methods perform statistically similarly, the agreement being excellent to both CC3-LR and EOM-CCSDT-3. In numbers: CC3-LR, CCSDR(3)-LR, and EOM-CCSD(T), compared to EOM-CCSDT-3, give average absolute errors of 0.05, 0.03, and 0.06 eV, respectively, with maximum errors of 0.15–0.16 eV. Fortunately, the outliers do not include pyrimidine or imidazole structures found in DNA bases. It seems, therefore, safe to use the noniterative methods, such as EOM-CCSD(T), to approximate the triples effect, for a cost much lower than in the iterative methods.
Although not discussed in Ref., it should be noted that the so-called active-space CC methods, like EOM-CCSDt, seem to give results closer to EOM-CCSD than, for example, to CR-EOM-CCSD(T). This indicates that the active-space formulation is not quite adequate for including the dynamic correlation of triples; rather, it is a good tool to correct the wave function if double excitations have a substantial contribution to the target state. This is, however, not the case for the low lying states of nucleobases.
Finally, having large applications in mind, parallel implementation is of crucial importance. Efficient massively parallel implementations in the program system ACESIII (ACESIII is a massively parallel program for CC and MBPT calculations for molecular structure and spectra.) have been reported by Kus et al. for the EOM-CCSD method, and by Watson et al. for the EOM-CCSD(T). As these results show, ACESIII is able to use thousands of processors, making applications viable not only to nucleobases, but also to their complexes, such as Watson–Crick pairs or stacked dimmers.[35, 36] Watson et al. even announce calculations on a dimer of Watson–Crick pairs, that is, a complex of four nucleobases. To demonstrate these possibilities, some timing data are shown in Table 1.
Table 1. Parameters and timings of the ACESIII calculations of this study[a].
Calculations have been performed with aug-cc-pVDZ basis.
The other massively parallel implementation of EOM-CC methods can be found in the program NWChem. Some timing data for calculating triples corrections with this program have been reported by Kowalski et al., an impressive examples include fused porphyrins up to 270 electrons.
Application of CC theory to DNA fragments
To set up a methodological scheme for studying excitation properties of DNA fragments, accuracy needs to be assessed first; it should be checked not just internally, but also in comparison to experiments. The latter is a critical point: theoretical studies most often refer to gas phase, in contrast to experiments in solution or condensed phase. Second, absorption maxima do not correspond to the theoretically obtained vertical excitation data, these two can differ as much 0.2–0.3 eV. Therefore, in a recent study of cytosine Bazsó et al. have measured the experimental spectrum in a matrix and simulated the complete absorption spectrum including vibrational structure. The latter is an extra challenge for theory since it requires not only the vertical excitation energies but also information on the potential energy surfaces for both ground and excited states. An often used compromise is the linear vibronic coupling (LVC) model by Köppel et al. which was used here. The parameters of the model were determined from CC calculations: specifically, vertical excitation energies obtained from CC3-LR, whereas parameters of the potential energy surfaces at the CCSD level. In addition, the spectrum was calculated for a mixture of tautomers expected to be present in the matrix. To avoid any bias, the tautomeric ratios were obtained by a combined use of the theoretical and experimental matrix-isolation vibrational spectrum. Figure 4 of Ref. shows that all measured maxima and even smaller features of the spectrum are well reproduced. Note, however, that this good agreement required the use of CC3-LR excitation energies, whereas with EOM-CCSD excitation energies the agreement was poorer. It was concluded that CC3-LR, one of the highest levels of theory for excited states, predicts the excitation energies of cytosine within 0.1 eV.
The next question concerns the somewhat less demanding EOM-CCSD(T) method. As discussed above, Watson et al. studying a large number of excited states have come to the conclusion that CCSD(T) is as accurate as the iterative CCSDT-3 and CC3 levels. To check specifically also for nucleobases, Szalay et al. compared the performance of EOM-CCSDT-3, CC3-LR, and EOM-CCSD(T) on cytosine. As Table 1 of Ref. shows, all three methods give good agreement for vertical excitation energies, with maximum discrepancies of about 0.1 eV. It means that the computationally less expensive EOM-CCSD(T) is also suitable for the calculation of excitation energies of nucleobases. Similar conclusion has been reached by Epifanovsky et al. for uracil, using the CR-EOM-CCSD(T) method.
Having larger molecules in mind, however, EOM-CC with triples, or even with only singles and doubles seems too expensive. Thus, one may ask: is it really necessary to use these very demanding methods?
Within CC framework, a less expensive alternative often used to study excited states is CC2-LR (an approximation even with respect to CCSD). The performance of CC2-LR is best represented in Figure 3 of Ref. based on the data from Schreiber et al. and Sauer et al.: it shows that CC2-LR approximate the triples results considerably better than EOM-CCSD. Unfortunately, this conclusion cannot be completely transferred to nucleobases. For the latter, there is an excellent agreement between CC2-LR and EOM-CCSD(T) for the ππ* excitations only, whereas the energies of nπ* states are often underestimated by as much as 0.5 eV. Since this latter conclusion is based on calculations with aug-pVTZ basis, in Figure 1 and Table 2 we show results on the four DNA nucleobases obtained with the TZVP basis set, the one used in Refs.[30, 31]. It is seen that EOM-CCSD systematically overshoots (points are consistently above the line), whereas CC2-LR gives results closer to EOM-CCSD(T). There are, however, several outliers, all of nπ* character. This failure for nπ* transitions does not really influence the simulation of spectra since ππ* excitations dominate here. However, in relaxation dynamics, where the order of ππ* and nπ* states plays an important role, misplacing nπ* states may disturb the interpretation of the mechanism. Finally, ADC(2), an approximation similar to CC2-LR, seems to give the most irregular results in comparison to CCSD(T).
Table 2. Excitation energies (ΔE in eV) and oscillator strength (f in a.u.) of the nucleobases calculated by different methods (TZVP basis).
Concerning non-CC methods, the popular DFT and CASPT2 methods should briefly be mentioned. Literature data by these methods were analyzed in Ref.. According to this, TDDFT/B3LYP does well for the lowest ππ* transitions (cytosine, adenine, and guanine) but it fails for thymine where there is another lower nπ* transition. Also, for second and higher transitions the disagreement grows rapidly, soon reaching several tenths of eV. Even MRCI/DFT seems to perform irregularly. This conclusion on the DFT methods cannot be considered general, since it is based only on one functional. The CC test results available now in the present article and in Ref. could be used to make more systematic benchmarking of the wide variety of functionals. CASPT2 in general seems to underestimate the excitation energies as can be seen also in Figure 1 and Table 2. Here again our results in certain extent contradict that of Schreiber et al.: this was explained in Ref. by a cancellation of errors present with the TZVP basis.
Complexes of Nucleobases
After a careful calibration of the methods in Ref., Szalay et al.[35, 36] presented benchmark results for some complexes of the nucleobases, including microhydrated complexes, nucleotides (cytidine and guanosine), as well as stacked and Watson–Crick pairs.
Concerning microhydration with one to five water molecules, it was found that the presence of water influences mostly the energies of the nπ* transitions (an increase up to 0.7 eV), whereas the effect on ππ* transition energies is much smaller and usually a decrease. As a result, microhydration may change the order of the excited states, thus influencing the results of dynamic studies involving nπ* states in the mechanism.
The glycoside bond seems to have a much smaller effect, the largest change being about 0.1 eV for valence states and only slightly larger for Rydberg states. Still, even this small change can have important consequences: for example, while the 1(πR) state is the lowest excited state in guanine, it is the valence 1(ππ*) transition for guanosine.
For both stacked and Watson–Crick pairs, Szalay et al.[35, 36] found that most of the states can be classified as local excitations on one of the monomers, with very little change in the excitation energy compared to the isolated monomers. However, charge–transfer (CT) states were also identified in all pairs. In this case, the transition can be assigned to an excitation from the monomer with lower ionization potential (guanine and adenine) to the other monomer. Comparing stacked and Watson–Crick pairs for guanine–cytosine, for the former these excitations have substantial transition moments due to the π–π interaction of the building blocks, whereas the transition moment for the CT excitation in case of the Watson–Crick pair is rather low. No excitonic state has been found for these dimers. On these complexes, the results have also been compared with lower level (ADC(2), TDDFT/M06-HF) calculations by Aquino et al. showing that the latter are not quite adequate for an accuracy of 0.1 eV. Also, the discrepancies to the CC results are, unfortunately, not systematic enough. Both methods predict CT transitions, but the order of the states differs significantly from that of EOM-CCSD(T).
Outlook and Perspectives
The accuracy of CC methods in calculating excited states has been demonstrated above. This is encouraging from the point of view of extending these types of calculations to other problems like simulation of vibronic spectra, exploring environmental effects, or studying electron transfer. Also, since highest level CC methods are expensive, one has to look for more cost effective alternatives for the investigation of larger fragments of DNA.
Simulation of the spectra of base pairs
As mentioned above, vertical excitation data alone are not quite appropriate for comparing theory with experiment. The ideal theoretical approach is, therefore, the simulation of “observable” quantities, that is, the full UV spectrum including its vibronic structure. This requires much extra effort since the knowledge of the excited state energy surface is needed. Fortunately, the LVC method of Köppel et al. gives an approximate solution which seems to work quite satisfactorily as demonstrated, for example, on cytosine above. Even this approximation involves complex calculations since it requires the normal coordinates of the ground state, as well as derivatives of the excited state energies and couplings with respect to these coordinates. Of these, the calculation of the normal coordinates is the most challenging task. Still, it can perhaps be calculated at lower level like MP2 or alike, without compromising accuracy. By contrast, the computational effort to obtain all derivatives is comparable with that of the vertical excitation energy if analytic gradient techniques are used. Therefore, derivatives themselves can well be calculated at the higher level. Analytic gradients are available for EOM-CCSD, with a massively parallel implementation in ACESIII.
Even considering these approximations, simulation of the spectra of the dimers of nucleobases represents a great challenge for the computations. Since vertical excitation energies can be calculated at high level like EOM-CCSD or EOM-CCSD(T), it seems to be possible to obtain the vibrational effects accurately using LVC simulation. This might be important for several reasons. First, by comparing with experiment, accuracy of the computational procedure can be assessed. Certainly, there is no reason to doubt that the basic excitation process will be described accurately in these cases, too, but the rather weak interaction between the nucleobases is a delicate question which warrants further studies. Second, subtle effects certainly present in the dimers could also be identified leading to a more detailed knowledge of the interaction and its effect on excited states. Third, the calculation can help the assignment of the experimental spectra. Finally, if experiments are not available, the calculations can provide the excitation envelope for dynamical studies.
Particularly interesting are the differences between the spectra of the Watson–Crick and stacked pairs which can give important hints with respect to the effect of hydrogen bonds versus π interactions.
Note in passing that these simulations should be augmented with environmental/solvent effects to maintain the comparability of the simulation with experiment. Possibilities of the treatment of the solution are discussed below.
Relaxation of excited states: In-strand vs. inter-strand CT
Relaxation of excited states plays an important role in stabilizing DNA against photochemical damage. There is a disagreement between experimentalists whether in-strand or inter-strand processes or both govern this relaxation dynamics after excitation.[18, 47] In case of charge transport along the DNA chain, stacking interactions are obviously important because they provide the necessary link between the π systems of the individual bases. In addition, the importance of inter-strand connection (i.e., Watson–Crick pairs) has also been proved: in the experiment of Takada et al. an excitation signal passes through a DNA chain, indicating the lack of relaxation, if the inter-strand connection is broken.
To resolve this question, one should investigate the competition between in-strand and inter-strand CT on a level where inaccuracy does not introduce any bias into the procedure. The simplest model where both interactions are present is a stacked pair, that is, a complex of four pairs. Such a complex system presents an extreme computational challenge at the CC level, but is still tractable: first attempts of such calculations have been reported recently. If indeed viable, such calculations could provide lots of new information. Beside the accurate energies of the excited states, one could also study the flow of charge during the excitation process by inspecting the natural orbitals of the difference density, as in Ref..
Electronic coupling for charge transport
The celebrated Marcus theory can be used to model CT along the DNA chain. The ingredients of this model are sites and their interaction. To have a reliable model, an accurate knowledge of the site energies and the electronic couplings are necessary. In case of DNA,[50, 51] the site energies are the energies of the ionized bases, whereas the coupling is closely related to the non-adiabatic matrix elements between the two states. CC theory may provide an accurate tool for the determination of these parameters: ionized states can be treated by the EOM-IP-CC methods, whereas nonadiabatic coupling terms have been derived for EOM-CC wave functions recently.[53, 54]
CC calculations could provide a benchmark parameterization for the transport models. We predict that such calculations can be performed for dimers of nucleobases in both stacked and Watson–Crick pairs, as well as for the stacked pairs of Watson–Crick adducts. One important conclusion of these studies could be the comparison of the CT probability between stacked and Watson–Crick pairs. To study the transfer rate along longer chains, complexes of three or even four stacked bases might be interesting. There are also speculations that irregularity in DNA chain plays an important role in charge transport. Therefore, it is of interest to study how the transfer properties depend on the relative positions of the nucleobases along the helix. To maintain the connection to natural DNA, these geometries should be taken from experimentally determined structures.
Effect of the environment
Theoretically obtained gas phase data are important for understanding of the basic features of DNA, but from the point of view of biological application, inclusion of the environment is necessary. Comparison of explicit and implicit water models is necessary to understand the role of direct interactions, such as hydrogen bonds. Also, polarization by the media may significantly influence the properties of the solute molecules. The detailed investigations of the excitation energies of the microhydrated cytosine in Ref. can be a starting point for such studies. As a next step beyond spectral properties themselves, transport properties should also be investigated, including environmental effects.
As of explicit water models, one should mention the polarizable continuum model implementation at the EOM-CCSD level by Caricato et al. This and other possibilities of inclusion of environment effects in the CC calculations have been summarized recently by Sneskov and Christiansen. They also discuss possible QM/MM setups. The latter is also a useful way of including those parts of the DNA which cannot be treated by CC methods. It can be predicted that future calculations will mostly include these effects.
Development of approximate CC methods
To extend the calculations to larger systems, the CC method with triple excitations must be replaced by approximate ones. As discussed above, the presently available approximate methods such as CC2-LR, CASPT2, TDDFT do give useful results and help to answer many questions about the excited states, but are not quite satisfactory if high accuracy is required. Considering computational costs, it is clear, however, that second-order approximation is the only realistic way to treat dimers or even larger complexes of nucleobases.
Beside CC2-LR and ADC(2) discussed above, there are other second-order methods in the literature which should also be included in a systematic study and compared to the high-level results overviewed above. In my opinion such comparison could result in very useful information on the role of a wide variety of contributions included or neglected in the different approximations. Such studies may localizes terms responsible for the different performance of CC2-LR for ππ* and nπ* states.
The most important requirement for an improved method are: (i) balanced description of ππ* and nπ* states; (ii) reliable oscillator strength; (iii) availability of analytic derivatives; (iv) size-consistent results ensuring applicability to larger systems; (v) possibility for massively parallel implementation. In addition, the method should also be extended to ionized states which would enable it for the description of transport properties. Such a new method would open up new possibilities in the research of the electric properties of DNA.
In summary, it was shown in this article that CC methods are now capable describing the building blocks of DNA. Continuing the preliminary investigations of the last couple of years, spectacular results are expected in the near future.
This work largely benefitted from discussions and cooperation with Prof. Bartlett and his coworkers at the University of Florida. The author thanks in particular the valuable comments by Prof. Fogarasi (Eötvös University, Budapest) on the present manuscript.
Péter G. Szalay was born in Szentes, Hungary in 1962. He received his M.Sc. degree in 1986 at Eötvös Loránd University (ELTE) in Budapest under the supervision of Géza Fogarasi and in 1989 his Ph.D. at the University of Vienna under the supervision of Hans Lischka. The latter work consisted of multireference configuration interaction (MR-CI) calculations on the ground and excited states of conjugated molecules. At the same time, he joined the Columbus community and worked on improvements of the MR-CI code including the first analytic derivative code for this type of wave function. Between 1991 and 1993, he was postdoc with Rod Bartlett at the Quantum Theory Project at the University of Florida where he worked on the development of Coupled-Cluster methods and became one of the developers of ACESII program system. The most important achievement of this period was the multireference averaged quadratic coupled-cluster (MR-AQCC) method. Since 2003, he is a full professor at ELTE and served as chairman of the Institute of Chemistry between 2005 and 2008. His research, beside method development, concentrates on excited states and corresponding spectroscopy.