B-DNA Structure and Stability as Function of Nucleic Acid Composition: Dispersion-Corrected DFT Study of Dinucleoside Monophosphate Single and Double Strands

We have computationally investigated the structure and stability of all 16 combinations of two out of the four natural DNA bases A, T, G and C in a di-2′-deoxyribonucleoside-monophosphate model DNA strand as well as in 10 double-strand model complexes thereof, using dispersion-corrected density functional theory (DFT-D). Optimized geometries with B-DNA conformation were obtained through the inclusion of implicit water solvent and, in the DNA models, of sodium counterions, to neutralize the negative charge of the phosphate groups. The results obtained allowed us to compare the relative stability of isomeric single and double strands. Moreover, the energy of the Watson–Crick pairing of complementary single strands to form double-helical structures was calculated. The latter furnished the following increasing stability trend of the double-helix formation energy: d(TpA)2 <d(CpA)2 <d(ApT)2 <d(ApA)2 <d(GpT)2 <d(GpA)2 <d(ApG)2 <d(CpG)2 <d(GpG)2 <d(GpC)2, where the energy differences between the last four dimers, d(ApG)2, d(CpG)2, d(GpG)2 and d(GpC)2, is within 4.0 kcal mol−1, and the energy between the most and the least stable isomers is 13.4 kcal mol−1. This trend shows that the formation energy essentially increases with the number of hydrogen bonds per base pair, that is two between A and T and three between G and C. Superimposed on this main trend are more subtle effects that depend on the order in which bases occur within a strand from the 5’- to the 3’-end.


Introduction
The DNA double helix results from several types of stabilizing but also destabilizing interactions between and within its two complementary, intertwined polydeoxyribonucleotide chains. In fact, the major contributions to the stability of the DNA structure come from: [1] (1) hydrogen bonds in adenine-thymine (AT) and guanine-cytosine (GC) Watson-Crick pairs; (2) p-p stacking interactions among Watson-Crick paired nitrogen bases stacked along the double helix axis; (3) electrostatic repulsive interactions among the phosphate groups; (4) electrostatic attractive interactions between the phosphate groups and cations dissolved in water solution; (5) hydrophilic inter-actions of the sugar-phosphate skeleton with water; (6) hydrophobic interactions of the DNA cylindrical core, made up by the hydrogen-bonded and stacked nitrogen bases, with the water solvent. Recently, there has been increasing effort in developing and applying quantum chemical methods able to reproduce the structure of native B-DNA and to correctly describe the energy involved in the intrastrand and interstrand noncovalent interactions between the nucleotide monomers. This topic has been approached by both wave function methods and density functional theory. [2] Water solvent and sodium counterions also play an important role in the formation and relative stabilization of the double-helical DNA structure because they mitigate the aforementioned long-range electrostatic repulsions among the phosphate groups in the DNA backbone and amplify p-p stacking interactions through the hydrophobic effect. This issue has been recently considered in the choice of model systems in density functional theory (DFT) calculations of the structure of dinucleoside monophosphate single strands. [3] The results obtained show that, not unexpectedly, the presence of the Na + counterions at each phosphate group is even more important than the presence of the implicit solvent for providing a structure in which the relative base-base orientation and the backbone torsion angles are in better agreement with experimental B-DNA crystal structures. This result reconfirms the importance of charge neutralization in DNA model systems.
A complicating aspect of DNA in computational studies is the combination of the large size of model systems in combination with the high demand on accuracy for describing the We have computationally investigated the structure and stability of all 16 combinations of two out of the four natural DNA bases A, T, G and C in a di-2'-deoxyribonucleoside-monophosphate model DNA strand as well as in 10 double-strand model complexes thereof, using dispersion-corrected density functional theory (DFT-D). Optimized geometries with B-DNA conformation were obtained through the inclusion of implicit water solvent and, in the DNA models, of sodium counterions, to neutralize the negative charge of the phosphate groups. The results obtained allowed us to compare the relative stability of isomeric single and double strands. Moreover, the energy of the Watson-Crick pairing of complementary single strands to form double-helical structures was calculated. The latter fur-nished the following increasing stability trend of the doublehelix formation energy: d(TpA where the energy differences between the last four dimers, d(ApG) 2 , d(CpG) 2 , d(GpG) 2 and d(GpC) 2 , is within 4.0 kcal mol À1 , and the energy between the most and the least stable isomers is 13.4 kcal mol À1 . This trend shows that the formation energy essentially increases with the number of hydrogen bonds per base pair, that is two between A and T and three between G and C. Superimposed on this main trend are more subtle effects that depend on the order in which bases occur within a strand from the 5'-to the 3'-end. various delicate, yet crucial types of interactions, such as, hydrogen bonding and p-p stacking. The latter can be adequately described using DFT or post-Hartree-Fock ab initio methods that properly account for electron correlation. However, the size already of minimal model systems for double-stranded DNA is prohibitive for post-Hartree-Fock approaches. For example, coupled-cluster theory with single and double electron excitations and triple electron excitations treated perturbatively, that is, CCSD(T), at the estimated complete basis set (CBS) limit, can be applied to molecules containing no more than about 50 atoms. [2b,c] On the other hand, DFT methods are able to deal with, and accurately describe these large DNA models. Hydrogen bonding in Watson-Crick base pairs and other DNA structures is well described using the generalized gradient approximation (GGA). [4] Interactions, such as p-p stacking, in which London dispersion features prominently, require the use of semilocal hybrid functionals that recover medium-range correlation effects as proposed, for example, by Truhlar and coworkers, [5] but they can also be treated very efficiently using Grimme's empirically dispersion-corrected density functional theory (DFT-D). [6] Model systems in pioneering DFT studies so far cover singlestranded GC dinucleoside monophosphate models (including sodium counter ions and implicit water solvent), [3] the ten possible combinations of DNA bases A, T, G and C in stacks of two Watson-Crick pairs (but without sugar-phosphate bridge between the Watson-Crick pairs), [7] and double-stranded DNA dimers (but with the sugar-phosphate bridge in the MM part of a QM/MM approach). [8] In the present work, we extend the above quest to a quantum chemical exploration of the 16 possible single-stranded and 10 double-stranded B-DNA models that exist for the dinucleoside monophosphates of A, T, G and C, using dispersioncorrected DFT at BLYP-D/TZ2P as implemented in the Amsterdam Density Functional (ADF) program (see below). The latter approach has been shown to accurately reproduce CCSD(T) structural and energy data of stacked as well as hydrogenbonded AT and GC complexes. [4f] Our DFT-D study also covers the sodium counterions at the phosphate groups as well as the effect of aqueous solvation using the conductor-like screening model (COSMO). The purpose of our work is to obtain a more detailed understanding of the structure and stability of B-DNA and how this depends on DNA-base sequences. The resulting insight is also relevant, among other things, for modeling and rationalizing the interaction of drugs with DNA, following any type of covalent [9] and/or noncovalent drug-DNA binding with particular emphasis on DNA intercalators. [10] Computational Details All calculations were performed using the Amsterdam Density Functional (ADF) program developed by Baerends, Ziegler, and others, [11,12] and the QUantum-regions Interconnected by Local Descriptions (QUILD) program by Swart and Bickelhaupt. [13] The QUILD program is a wrapper around ADF (and other programs) and is used for its superior geometry optimizer which is based on adapted delocalized coordinates. [13a] The numerical integra-tion was performed using the procedure developed by te Velde et al. [12a,b] Molecular orbitals (MOs) were expanded in a large uncontracted set of Slater-type orbitals (STOs) containing diffuse functions: TZ2P (no Gaussian functions are involved). [12a,b] The basis set is of triple-z quality for all atoms and has been augmented with two sets of polarization functions, that is, 3d and 4f on C, N, O and 2p, 3d on H. The 1s core shells of carbon, nitrogen and oxygen were treated by the frozen-core approximation. An auxiliary set of s, p, d, f and g STOs was used to fit the molecular density and to represent the Coulomb and exchange potentials accurately in each self-consistent field cycle.
Geometries and energies were computed with dispersioncorrected density functional theory (DFT-D) using the BLYP-D functional in which the regular BLYP functional [14,15] is augmented with an empirical correction for long-range dispersion effects, described by a sum of damped interatomic potentials of the form C 6 R À6 added to the usual DFT energy. [6b, 16] The basis set superposition error (BSSE) on the bond energy is effectively absorbed into the empirical dispersion-correction potential. [16a] The starting B-DNA structures of the single and double strands were built using the NUCLEIC routine of the TINKER molecular design program package. [17] Solvent effects in water have been estimated using the conductor-like screening model (COSMO), as implemented in the ADF program [18] with the following settings: solvent radius and dielectric constant for water were 1.9 and 78.4, respectively, and atomic radii were taken from the MM3 van der Waals radii scaled by 0.8333 (for details, see Table S1 of ref. [19]). The surface charges at the GEPOL93 solvent-excluding surface were corrected for outlying charges. According to the work by Riley et al., [20] the dispersion correction does not need to be modified for the solvated systems.
The Cartesian coordinates and energies of stationary points are given in the Supporting Information.

Results and Discussion
In the following four sections, we focus on two important findings of our investigations. The first one concerns the reliability of DFT-D in combination with the conductor-like screening model (COSMO) for mimicking bulk solvation in aqueous solution, for the calculation of dinucleoside monophosphate single and double strands in physiological conditions. The calculated structures are in fact in good agreement with experimental structures of B-DNA oligodeoxynucleotides obtained by X-ray crystallography. [21] The second important finding emerges from the analyses of the relative stability of the various model B-DNA double strands in terms of their formation energy from the corresponding single-stranded moieties as well as the energy contributions into which the formation energy can be decomposed. These findings shed light, among others, on how the overall stability and structure of B-DNA depends on its composition in terms of AT versus GC Watson-Crick pairs and the role of intramolecular interactions in genetic evolution.
There are 16 different doublet sequences and this number increases to 64 for triplet sequences in single-stranded DNA.
Both combinations decrease to 10 and to 32 for doublet and triplet sequences, respectively, in doublehelical DNA. In this respect, it is worth recalling that each of the DNA triplet sequences are the smallest units of genetic code in protein synthesis. [22] In the present work, we have focused our attention on DNA dimers, and we plan to extend our investigation on the structure and stability of DNA triplets in a forthcoming paper, using the same computational approach reported herein.  Table S1 in the Supporting Information. The backbone torsion angles of the optimized structures are reported in Tables 1 and 2 and compared with available experimental crystal structures of B-DNA oligonucleotides. [21] The latter derive from a statistical analysis of 34 B-DNA oligodeoxynucleotide structures, characterized with resolutions between 0.7 and 3.3 . The hydrogen bond distances between Watson-Crick complementary bases are reported in Table 3. The atom labels used for defining the backbone torsion angles and the hydrogen bonding distances are shown in Figure 2.
There is satisfactory agreement between the average values of the torsion angles of both single and double strands with available experimental crystal structures of oligodeoxynucleotides in B-DNA conformation. [21] The torsion angles and hydrogen bond distances data reported in Tables 1 and 2 show that larger distortions from the B-DNA conformations, with a significantly lower value of the b torsion angle, have been observed for d(GpA), d(GpC) and d(GpG) in single strands and for d(GpA), d(CpA) and d(CpG) in double strands.
The hydrogen bond lengths obtained are also in good agreement with those calculated with the same DFT-D method and basis set for isolated DNA nitrogen bases. [4g] However, the presence of the sugar phosphate chain induces small distortions in the hydrogen-bonded base pairs. These distortions are somewhat more pronounced in the case of AT than GC base pairs. For example, there is a closer match between the distances of isolated AT base pairs and those in d(TpA), whereas N6-O4 and N1-N3 distances are slightly shorter and longer, by 0.1 , respectively, in d(ApT) (see Table 3 and ref.

Double-strand stability
In the following, we show how the relative stability of the double-stranded dimers emerges from an interplay between the strain energy in the single strands and the interaction between these single strands. Especially trends in the latter can be well understood in terms of molecular structural features. We analyze the stability of a DNA double helix in terms of the (relative) complexation energy associated with forming the double strand from two single strands, DE double (or DE double,rel ), see Equations 1: This analysis leads to a global order of stabilities among all of our DNA double strands which are collected in Table 3. The major categorization in terms of stability order can be made, not unexpectedly, in terms of the number of AT and GC Watson-Crick pairs. The stability DE double of the model double helix increases as the number of GC pairs increases from zero (À14 to À19 kcal mol À1 ) to one (À19 to À23 kcal mol À1 ) to two (À24 to À27 kcal mol À1 ). Differences in energy between isomeric single strands are significantly smaller, that is, less than 1 kcal mol À1 between TpG and GpT, between CpA and ApC, between GpA and ApG, and between ApT and TpA; 1.2 kcal mol À1 between CpT and TpC, and 1.6 kcal mol À1 between CpG and GpC (see Table S1).
However, also within a family of isomers, consisting of the same number of AT and GC pairs, there are significant fluctuations in stability. As can be seen in Table 3, the following orders of stability in terms of complexation energy DE double can be observed for the three families of model double helices (see Equations 2): dðApAÞ 2 > dðApTÞ 2 > dðTpAÞ 2 ð2aÞ dðApGÞ 2 > dðGpAÞ 2 > dðGpTÞ 2 > dðCpAÞ 2 ð2bÞ This is in line with our earlier finding that the affinity of a model template-primer complex for an incoming new DNA base not only depends on the complexation energy of the new Watson-Crick (or mismatched) base pair but also on the terminal base in the primer strand to which the incoming DNA base binds via p-p stacking interactions. [7a]

Analysis of double-strand interaction
To obtain a better understanding of the in silico observed trends, we have decomposed the formation energy DE double of a double strand from two complementary single strands, into two main energy contributions: 1) the strain DE strain that is the energy required to distort the isolated single strands from their individual equilibrium structure into the geometry they assume in the double strand; and 2) the interaction energy DE int that is the difference between the energy of the double helix and that of the two single strands with the same geometry they assume in the double helix (see Equation 3): The results of our analyses for the 10 double helix models are collected in Table 3. In the first place, it appears that, for a given family of double helical isomers, the trend in relative stability DE double always follows the amount of strain energy DE strain associated with deforming the single strands to the geometry they have to adopt to form Watson-Crick pairs in the overall system. For example, along d(ApA) 2 , d(ApT) 2 , and d(TpA) 2 , the strain energy DE strain increases from 2.9 to 3.6 to 10.3 kcal mol À1 and, accordingly, the complexation energy DE double,rel weakens from À19.1 to À18.7 to À13.8 kcal mol À1 (see Table 3). This interesting trend is related to the increasing structural distortion of the complementary single strands when they form the double-stranded complex. In general, all the values of torsion angles of the free single-stranded structures, reported in Table 1, change when they form the double-stranded structures (see Table 2). Remarkably, the value of b of the single-stranded doublets, which involves the atoms C4'-C5'-O5'P (see Figure 2), experiences the largest modification in the less stable double-stranded structures, namely d(TpA) 2 d(CpA) 2 and d(CpG) 2 (see Table 3). In more detail, the value of b changes by some 348 for d(TpG), 388 for d(TpA), 488 for d(CpG), and 528 for d(CpA) (compare Tables 1 and 2). Inspection of Figure 3-5 (top view) reveals that, in all four of these dinucleotide strands, the associated internal rotations around the C5'-O5' bond leads to a less favorable p-p stacking ar- Table 1. Backbone torsion angles [8] of the 16 dinucleoside monophosphate B-DNA single strands. [a] ss PO3'-C3'-C4' (e) O5'-PO3'-C3' (z) C2-N1/C4-N9-C1'-O4' (c) [b] (7) Av (SD) [c] 176.8 (16) À79.2 (6) À67.6 (8) À170.2 (24) 48.8 (4) À106.1 (17) Exptl (SD) [d] 184 (11) À95 (10) À62 (15) 176 (9) 48 (11) À102 (14) [a] Computed at BLYP-D/TZ2P and COSMO for simulating aqueous solution. See Figure 2 for definitions and Figure 1 for structures. [b] Average of the two c angles present in each dinucleoside monophosphate (see Figure 2).
[c] Average with standard deviation (SD) over the 16 dinucleoside monophosphate single strands.
[d] Experimental values with standard deviation (SD) from X-ray crystal structures of oligodeoxynucleotides in B-DNA conformation (ref. [21]). rangement between the two stacked DNA bases in the resulting double helices than in the separate single strands. Likewise, the more stable double-stranded isomers with the less strained single-strand fragments display more favorable p-p stacking arrangements between the stacked DNA bases within a single strand. This can be understood as a conflict, arising from the structural characteristics of the dinucleotides d(TpG), d(TpA), d(CpG) and d(CpA), between, on one hand, the requirement to adopt a proper arrangement for entering into interstrand Watson-Crick pairing and, on the other hand, an optimal intrastrand p-p stacking arrangement. The interaction energy DE int behaves in a less systematic manner than DE strain . Along the same series, d(ApA) 2 , d(ApT) 2 , and d(TpA) 2 , DE int first remains essentially constant at À22.1 and À22.1 kcal mol À1 and then strengthens to À24.0 kcal mol À1 .

Contribution of the sugar phosphate backbone
To quantify the effect of the backbone on the stability of the Watson-Crick pairs in the duplex structure, we also compare them with those in simple Watson-Crick pairs and in the simple stacks of Watson-Crick pairs, that is, stacks without a sugarphosphate-sugar backbone. Previously, we found that DE double in aqueous solution for simple AT and GC Watson-Crick pairs amounts to À9.8 and À13.6 kcal mol À1 , respectively, computed at the same level of theory as used in this study. [4g] If we compare this to the present model duplexes, we see that the latter have effectively slightly weaker Watson-Crick bonds per base pair. For example, in the strongest AT bound duplex, that is, d(ApA) 2 , the effective Watson-Crick bond strength per base pair amounts to DE double /2 = À9.6 kcal mol À1 . Apparently, the slight increase in deformation strain DE strain that occurs if a backbone is present somewhat reduces the effective stability of the Watson-Crick interaction. As this strain becomes larger, in d(TpA) 2 , the effective stability is further reduced to DE double / 2 = À6.9 kcal mol À1 .   [8] of the 10 dinucleoside monophosphate B-DNA double strands. [a] ds ss [b] PO3'-C3'-C4' (e) O5'-PO3'-C3' (z) Av (SD) [d] 159.9 (7) À76.9 (38) À67.4 (6) À157.7 (14) 47.0 (4) À103.0 (7) Exptl (SD) [e] 184 (11) À95 (10) À62 (15) 176 (9) 48 (11) À102 (14) [a] Computed at BLYP-D/TZ2P and COSMO for simulating aqueous solution. See Figure 2 for definitions and Figure 3-5 for structures. [b] Single strands that constitute the double strand.
[c] Average of the two c angles present in each dinucleoside monophosphate (see Figure 2).
[d] Average with standard deviation (SD) over the 10 dinucleoside monophosphate double strands. [e] Experimental values with standard deviation (SD) from Xray crystal structures of oligodeoxynucleotides in B-DNA conformation (ref. [21]).
Thus, we have computed the complexation energy DE double associated with forming simple (AT) 2 stacks and (GC) 2 stacks from the corresponding "single strands" of AT + AT stacks and GC + GC stacks, respectively, each of them without the sugarphosphate-sugar backbone. The equilibrium structures of (AT) 2 and (GC) 2 are shown in Figure 6, wheras the corresponding hydrogen-bond distances and complexation energies DE double are reported in Table 3. These backboneless systems were obtained by removing the sugar phosphate units from d(ApT) 2 and d(GpC) 2 , substituting C1' with a H atom, and then optimizing the structure of the stacked base pairs (AT) 2 and (GC) 2 . Likewise, the constituting single strand analogues were obtained by removing, in addition, one of the two AT stacks from (AT) 2 and one of the two GC stacks from (GC) 2 , followed by optimization of the remaining stack of DNA bases. The computed complexation energies DE double appear to be À19.1 and À25.1 for (AT) 2 and (GC) 2 , respectively. This corresponds to effective Watson-Crick bond strengths per base pair of DE double /2 = À9.6 and À12.6 kcal mol À1 , respectively. This is almost, that is, within a kcal mol À1 , the same effective complexation energy as for simple base pairs.
We conclude that the geometrical constraints imposed by the backbone reduce the stability of B-DNA by 0-3 kcal mol À1 and 0-2 kcal mol À1 per AT and GC base pair, respectively. This weakening shows up in the aforementioned strain energy DE strain , a somewhat weakened interaction energy DE int , but also in slightly (i.e., by a few hundredth of an ) stretched or compressed hydrogen-bond distances (see Table 3). As soon as the backbone is removed from the double-helical model systems, the effective Watson-Crick complexation energy is enhanced and approaches that of the isolated base pairs. Likewise, hydrogen-bond distances in (AT) 2 and (GC) 2 relax and essentially adopt the values in the corresponding isolated base pairs (see Table 3). [4f] Thus, the values for N6-O4 and N1-N3 distances with an AT base pair of (AT) 2 are, on the average, 2.80 and 2.90 , respectively, whereas the values for O6-N4, N1-N3 and N2-O2 distances within a GC base pair in (GC) 2 amount to 2.84, 2.88 and 2.84 , respectively. Table 3. Hydrogen-bond structure (in ) and stability (in kcal mol À1 ) of the 10 dinucleoside monophosphate B-DNA double strands. [a] ds ss 1/ss 2 [b] Distance