The photophysics of protonated cytidine and hemiprotonated cytidine base pair: A computational study

We here study the effect that a lowering of the pH has on the excited state processes of cytidine and a cytidine/cytidine pair in solution, by integrating time‐dependent density functional theory and CASSCF/CASPT2 calculations, and including solvent by a mixed discrete/continuum model. Our calculations reproduce the effect of protonation at N3 on the steady‐state infrared and absorption spectra of a protonated cytidine (CH+), and predict that an easily accessible non‐radiative deactivation route exists for the spectroscopic state, explaining its sub‐ps lifetime. Indeed, an extremely small energy barrier separates the minimum of the lowest energy bright state from a crossing region with the ground electronic state, reached by out‐of‐plane motion of the hydrogen substituents of the CC double bond, the so‐called ethylenic conical intersection typical of cytidine and other pyrimidine bases. This deactivation route is operative for the two bases forming an hemiprotonated cytidine base pair, [CH·C]+, the building blocks of I‐motif secondary structures, whereas interbase processes play a minor role. N3 protonation disfavors instead the nπ* transitions, associated with the long‐living components of cytidine photoactivated dynamics.


INTRODUCTION
The pK a value for (deoxy)cytidine (hereafter C) in water, ∼4.2, 1 can be significantly altered in polynucleotides, either due to the effect of the charged phospho(deoxy)ribose backbone and, especially, to the possible formation of non-Watson Crick hydrogen bonding pairs with guanine (producing Hoogsteen base pairs) 2 and other cytosines. 3n particular, at weakly acid pH, C-rich DNA and RNA regions have been shown to adopt a particular secondary structure, commonly labeled as I-motif (see Figure 1), 3,4 with four "intercalated" antiparallel strands.The building blocks of I-motif are two base-paired cytidines, "sharing" one proton between the N3 atoms (see Figure 2B).This structure, hereafter labeled as [CH•C] + , is usually referred to as hemiprotonated CC base pair.Recently, I-motif has been the object of intense research activity, [4][5][6][7][8] both due to its possible biological and medical relevance 9,10 and to their potential use in bio-sensing applications. 11,12nfortunately, despite the important advances made in the last years, many basic features of I-motifs are not yet fully assessed.For example, their photoactivated behavior is still poorly understood, despite the fundamental role that optical spectroscopies have in their characterization. 4,13,14oreover, recent time-resolved (TR) studies have highlighted the very rich photophysics and photochemistry of I-motif structures. 15,16In this respect, the oxidative processes of cytosine-rich DNA region are particularly relevant, [17][18][19] since cytosine deamination, and the subsequent GC → AT transversion, can be carcinogenetic. 20owever, as shown by the studies on the photophysics and photochemistry of other DNA structures, 19,[21][22][23][24][25] the photoactivated dynamics of polynucleotides depends on the interplay of many different excited state processes, often involving more than one base.On this ground, to disentangle the photophysical processes occurring in Imotifs, it is important to fully understand the excited state dynamics of their "minimal" building blocks, that is, protonated cytidine bases (hereafter CH + ) and, especially, hemiprotonated [CH•C] + base pairs.On the other hand, it is not easy to study the "intrinsic" behavior of [CH•C] + structures, that is, when not included in I-motif, making the contribution of theoretical methods particularly useful.In this study, we shall describe the main photophysical processes occurring in CH + and hemiprotonated [CH•C] + base pairs in water, resorting to quantum mechanical calculations, exploiting a strategy that has been already profitably adopted on other DNA nucleobases and DNA structures. 24e show that the protonation of N3 at acidic pH does not impact the behavior of the lowest energy bright state of cytidine, for which an almost barrierless path leads to a crossing region with the ground state, giving an account of its ultrafast, sub-ps, decay.On the other hand, it strongly destabilizes the lowest lying nπ* states, explaining why the long-living component, typical of cytidine excited state decay, 23,26 disappears at low pH.The photophysics of the hemiprotonated cytidine base pair can be interpreted on the basis of its constituents: the excited state processes F I G U R E 1 Schematic drawing of four C-rich strands arranged in an I-motif secondary structure.Each hydrogen-bonded pair is constituted by an hemiprotonated (deoxy)cytidine dimer (see Figure 2b). of CH + and C are not affected by the formation of the hydrogen-bonded pair, and interbases excited state processes are predicted to be not relevant.

MATERIALS AND METHODS
Most of our analysis has been performed using timedependent density functional theory (TD-DFT) calculations, using the M052X functional 27 as the reference method.M052X is not biased by the deficiencies of many functionals in describing electronic transitions with significant charge transfer (CT) character, 28,29 and we have profitably used this functional in describing the excited state behavior of cytosine derivatives 30,31 and, more in general, the photophysics and photochemistry of DNA. 23,24n the other hand, as discussed below, the transition energies predicted by M052X are significantly overestimated with respect to the experimental absorption band.In any case, for [CH•C] + , we double-checked the prediction of M052X using two other commonly used density functionals, that is, CAM-B3LYP, 32 and ωB97XD. 33eometry optimizations have been performed using 6-31G(d) basis set, complemented by single-point calculations with the larger 6-31+G(d,p) and 6-311+G(2d,2p) basis sets.
Bulk solvent effects have been included by using the polarizable continuum model (PCM). 34The effect of solute-solvent hydrogen bonds, which are expected to be more important for the isolated base, has been considered by using a mixed implicit-explicit model, where four H 2 O molecules of the first solvation shell (see Figure 2) were included in the species studied by PCM.This approach can satisfactorily reproduce solvent shifts on the absorption spectrum of cytosine derivatives. 35bsorption spectra have been computed at the solvent non-equilibrium level, whereas excited geometry optimizations and frequency calculations at the solvent equilibrium level. 34It is clear that this simple approach cannot treat situations where the excited state decay and the equilibration of solvent degrees of freedom occur on a similar timescale. 36In such cases, approaches exploiting an explicit description of the solvent are probably desirable. 37he effect of the sugar has been mimicked by a methyl group, which has been previously shown to be able to satisfactorily reproduce the excited state behavior of cytidine. 38In the following, we shall refer to CH + •4 D 2 O and to C•4 D 2 O when discussing our computational results, and to cytidine or protonated cytidine, when we want to highlight some general features of the bases, for example, when reporting the literature results.
Infrared (IR) spectra have been computed, after substituting with deuterium atoms all the hydrogen atoms bonded to nitrogen and oxygen atoms, in the harmonic approximation.Using a procedure commonly used in the literature (see, e.g., 39,40 ), each frequency has been scaled by 0.96, since the inclusion of anharmonic effect leads to an average red shift of the spectra, and then simulated by broadening each stick transition with a Lorentzian with half-width half maximum = 10 cm −1 .
Electronic transitions have been assigned with the help of natural transition orbitals (NTO), which provide a more intuitive picture in terms of individual hole-particle excitation. 41D-DFT calculations have been performed by the Gaussian16 package. 42he potential energy surfaces (PES) of the bright state of CH + in the gas phase have also been modeled using complete active space self consistent field (CASSCF) calculations combined with the 6-31G(d) basis set and corrected with second-order perturbation theory (CASPT2). 43,44The selected active space consists of 12 electrons distributed in 9 orbitals (5π, 1 lone pair [LP], 3π*), and 5 roots were included as part of the state average procedure.OpenMolcas software has been used for these calculations. 45

Protonated cytidine
To validate our computational model, we started our analysis by computing the IR spectrum for CH + •4 D 2 O.As shown in Figure 3, our calculations predict that in the fingerprint 1500-1700 cm −1 region the most intense  15 peak falls at ∼1670 cm −1 , and corresponds to a collective ring stretching mode, with a strong contribution from the C5=C6 double bond.Just above 1700 cm −1 we locate a peak, ∼0.50 less intense than the previous one, mainly associated with the C2=O2 stretching mode.Weaker features are then predicted at ∼1600 and ∼1520 cm −1 .The computed spectrum is in very good agreement with the experimental femtosecond TR IR spectrum of deoxycytidine monophosphate (CMP) in D 2 O in acidic conditions, 15 supporting the reliability of the adopted computational approach.
The main features of the lowest energy excited states of CH + •4 H 2 O in the Franck-Condon (FC) region are shown in Table 1.On the one hand, we observe several similarities with the picture obtained for cytidine in previous studies. 23,24,38,46The lowest energy excited state is a ππ* HOMO→LUMO transition, hereafter π H π* L , with bonding/antibonding character with respect to the C5=C6 double bond.The vertical absorption energy (VAE) of this π H π* L is weakly (∼0.15 eV) red shifted with respect to that predicted at neutral pH.As for cytidine, at higher energy, we then find two additional intense ππ* transitions, HOMO → LUMO+1 and HOMO−1 → LUMO.On the other hand, protonation at N3 has a significant impact on the lowest energy states, especially those with nπ* character.7][48] In water, a dark state involving both N3 and O2 LP falls between the two lowest energy bright states (see, e.g., the results for C•4 D 2 O in Table 1).Not only the electronic transition involving the LP of N3 disappears, as could be expected for CH• + , but also n O π* L is strongly destabilized (it is ∼2.2 eV less stable than π H π* L, S 5 in Table 1).In CH + •4 D 2 O, within 2 eV from π H π* L we then find only one nπ* state, which does not involve the excitation from the O 2 LP to the LUMO but to LUMO+1 (n O π* +1 in Table 1), and in cytosine is less vibronically coupled with the bright states. 46Protonation of N3 also increases the energy gap between π H π* L and the two higher bright excited states.As a consequence, our calculations predict that N3 protonation would be mirrored by a red shift of the absorption maximum, and, at the same time, by a decrease in the bandwidth, especially on the blue side.Without considering the vibrational contributions, which are a major source of broadening, 47,49 in C•4 D 2 O we have another bright state less than 1 eV higher in energy than π H π* L and, a close-lying and coupled dark transition, 46,48 as witnessed also by the non-zero oscillator strength (see Table 1).In CH + •4 D 2 O, the second bright state is 1.4 eV above π H π* L and the dark state is expected to be less coupled.This picture is fully confirmed by experiments, 50 which shows that at acidic pH the lowest absorption maximum of CMP red shifts by 0.13 eV and the bandwidth decreases.The description of the lowest energy excited states in the FC region provided by TD-PCM/ M052X/6-31+G(d,p)//PCM/M052X/6-31+G(d,p) calculations or by using CAM-B3LYP and ωB97XD functionals (see Table S1 in the Supporting Information) is similar to that described above.
From the quantitative point of view, the transition energies predicted by PCM/M052X are significantly blue shifted (∼0.75) with respect to maximum of the experimental absorption band of cytosine in water (4.45 eV). 50his error is partially due to the absence of thermal and vibrational effects in our treatment, which would lead to a systematic red shift of the band maxima (0.1 ∼ 0.2 eV) with respect to the vertical transition energies. 30,49,47On the other hand, the largest source of error is the adopted computational approach (e.g., density functional, basis set).The overestimation of the transition energies is a systematic feature of M052X, 29 and a similar blue shift is found also for cytosine. 30,51In fact, we correctly reproduce the solvent shift associated with N3 protonation.
Geometry optimization of π H π* L leads to a minimum (see Figure 4), similar to that found for C, where the pyrimidine plane is slightly bent, and the C′5=C′6 bond

Non-radiative decay routes
We performed an extensive exploration of the π H π* L PES at the PCM/TD-M052X level for CH + •4 D 2 O, also guided by the outcome of the analysis performed both at the CASPT2 and TD-DFT levels on cytosine. 38We located, also for CH + •4 D 2 O, (both at the PCM/M052X/6-31G(d) and PCM/M052X/6-31+G(d,p) levels) a crossing region with S 0 characterized by the out-of-plane motion of the C′5-H and C′6-H groups with respect to the molecular plane.This is the "typical" ethylenic CI, which has been shown to be the most effective non-radiative decay channel for C 38,52,53 and other pyrimidines. 23,24It can be reached (as depicted in Figure 5) from the minimum of π H π* L , by a relatively small out-of-plane motion of the C′5H5 group (a representative structure of this region is shown in Figure 4).A minimum energy path along the out-of-plane motion of H5 indicates the minimum of πHπ*L is separated from the crossing region with S 0 by a very small energy barrier (<0.05 eV).Moreover, the crossing region is significantly more stable (>0.5 eV) than the minimum of π H π* L .On this ground, we can expect an ultrashort excited state lifetime for CH + .This conclusion fully agrees with the experimental indications. 15,54Transient absorption experiments indicate that the lifetime of protonated cytidine, ∼630 fs, is shorter than that measured in the same experiments for the neutral compound, ∼1 ps. 54TR IR experiments confirm the very fast ground state recovery of CH + , showing that within 2 ps only the "vibrationally hot" S 0 is observed. 15e have characterized another possible non-radiative decay path, involving the out-of-plane motion and the rotation of the amino group with respect to the pyrimidine plane.In this way, we located another S 1 minimum (see Supporting Information), which is associated with the electronic transition from the amino LP toward the LUMO.The out-of-plane motion of the amino group then leads to another crossing region with S 0 , which, however, is much less stable (>0.5 eV) than this latter minimum, suggesting that this decay path should not be involved in the ultrafast dynamics.
Considering that TD-DFT has well-known limitations in locating conical intersections (CI) with the ground electronic state, 55 we double-checked the indications provided by TD-M052X by performing some test calculations in the gas phase at the CASPT2 level.Indeed, we succeed to optimize a CI (see Figure 6) resembling the one provided by TD-DFT at the CASPT2/CASSCF level of theory.Subsequent minimum energy path from this CI leads, without any energy barrier, to the π H π* L minimum, which is ∼0.8 eV less stable than the CI, confirming the topology described by TD-DFT.Although alternative decay channels (minimum and CI), characterized by more planar structures, were also located at this level of theory, they are less stable than the corresponding stationary point involving the ethylenic deformation of the pyrimidine ring.

Hemiprotonated cytidine dimer
In Table 2, we report the main features of the lowest energy excited state of [CH•C] + , whereas the NTO associated with the three lowest energy ones are shown in Figure 7.The lowest energy excited state is a ππ* (see Figure 7), localized on the CH + moiety, extremely similar to that described above and thus labeled, in the following, as CH + -π H π* L .S 2 is its counterpart but localized on the C moiety, hereafter C-π H π* L .S 3 , hereafter CCH + -CT can be described as a CT transition from the C to the CH + base.S 4 and S 5 are localized on C. The former is very similar to the second bright excited state previously described for cytidine (Cπ H−1 π* L ). 38 The latter is nπ* states involving the LP of N3 and O2.Finally, S 6 corresponds to n O π* +1 localized on CH + , described above.This picture is pretty stable, either enlarging the basis set or changing the functional (see Table 2).When the size of the basis set increases, CCH + -CT is more coupled with the bright excited states and acquires some intensity.CAM-B3LYP and ωB97XD predict an even larger coupling between C-π H−1 π* L and CCH + -CT, and in the FC region, S 3 and S 4 are close in energy and have similar oscillator strength.This picture does not change when optimizing the geometry of [CH•C] + using the larger 6-311+G(2d,2p) basis set (see the last column of Table 2).
Geometry optimization of S 1 (CH + -π H π* L ) leads to a minimum with structure and emission properties very similar to that described above for isolated CH + .Analogously, when optimizing S 2 , we obtain a minimum extremely close to that found for isolated C, 38 where the C ring adopts a slightly bent geometry.Finally, geometry optimization of S 3 predicts the decay to a minimum, resulting from the coupling between C-π H π* L and CCH + -CT, that is, with only partial C → CH + CT character.However, this weakly emissive (f = 0.09) minimum is still on the S 3 surface, and very close in energy to CH + -π H π* L and C-π H π* L , suggesting that further decay to these two states is likely.
In CCH + -CT, the CT from the "neutral" C to its protonated partner is expected to stabilize its positive charge.
Therefore, the proton-coupled electron transfer process should not be an important process.Indeed, all the attempts to compute the energy barrier associated with the transfer of the H3 atom from CH + to C on the CCH + -CT PES were unsuccessful since the system always decays to the lower-lying bright state.

Non-radiative decay routes
We focused our attention on the CH + moiety, and we explored the ethylenic crossing region between CH + -π H π* L and S 0 .Our calculations provide a picture extremely similar to that depicted in Figure 5 (see also Supporting   Information), and that also in this case this crossing region is very easily accessible from the minimum of the CH + -π H π* L state.The estimated energy barrier associated with the out-of-plane motion of the C′5-H5 moiety is extremely small (<0.02 eV).For what concerns C-π H π* L , we succeeded in locating the ethylenic crossing region with S 0 , obtaining a picture similar to that discussed in previous studies on isolated C in solution. 38,56lso, in this case, access to this region is predicted to be almost barrierless.According to our calculations, therefore, also in the hemiprotonated cytidine dimers, the access to a "monomer-like" non-radiative deactivation route is extremely easy.This is not surprising, since the nonradiative decay path involves C5=C6 moiety, which are not involved in the hydrogen bond interaction.

CONCLUDING REMARKS
In this study, we have characterized the excited state behavior of protonated cytidine and hemiprotonated cytidine-cytidine hydrogen-bonded pair in water, integrating PCM-TDDFT and CASPT2 calculations.For CH + , our computational results are in good agreement with the available experimental steady-state spectra.The computed IR spectrum in water is very similar to the experimental one and the effect of N3 protonation on the absorption spectrum is correctly reproduced, supporting the reliability of our computational approach.
Based on the mapping of the excited states PES, we obtain a picture of the CH + photophysics fully consistent with TR spectra.Geometry optimization of the lowest energy bright state (a ππ* HOMO → LUMO transition) leads to a shallow minimum, characterized by a very large Stokes shift, which is separated by a vanishingly small energy barrier from a crossing region with the ground state.CASSCF/CASPT2 calculations in the gas phase and PCM/TD-DFT calculations in water agree in predicting that the crossing region can be reached by a relatively small out-of-plane motion of the C5-H5 and C6-H6 bonds, the so-called ethylenic CI.The presence of this effective non-radiative decay route explains the ultrafast, sub-ps, excited state decay, faster than that of cytidine. 15,50According to our calculation, N3 protonation significantly destabilizes the nπ* transitions.On this ground, we can understand disappearance at low pH of the long living (30-40 ps) signal typical of cytidine, 26,[57][58][59][60][61] which has been associated with a nπ* involving the LP of N3 (n N π*) or of the carbonyl oxygen (n O π*). 23,38 Its absence after protonation of N3, however, does not automatically imply that the long-living signal has to be assigned to n N π*.In fact, we have shown that N3 protonation destabilizes not only n N π* but also n O π*.
Interestingly, we show that also the dark states localized on C bases are strongly destabilized by hydrogen bonding to CH + .
According to our calculation, the results of the studies on isolated CH + are an ideal basis to interpret the photophysics of hemiprotonated [CH•C] + pairs.The most important photophysical processes in [CH•C] + are similar to those found in CH + and C, whereas the role of the interbase deactivation route is expected to be quite limited.Indeed, a fast non-radiative decay for both species is predicted, through the ethylenic CI, which is more stable than the minimum of the spectroscopic state and can be reached after overcoming a very small energy barrier.
Though NMR experiments show that the proton can "hop" between the two C bases, the associated energy barrier to this motion is not small (3-6 kcal/mol), 62 and the hopping rate (10 8 s −1 ) makes this process likely not relevant for ultrafast photophysics.Moreover, the vibrational progressions appearing in photofragmentation spectrum of [CH•C] + are similar to the one observed in the CH + spectra, suggesting that the intramolecular vibrations are not heavily perturbed by dimerization. 63There are several indications, therefore, that, for what concerns the sub-ps processes, the behavior of [CH•C] + can be understood on the basis of that of CH + and C.
Our calculations indicate that the deactivation routes involving CT and proton-coupled electron transfer processes, which are significantly involved in GC dimers, isolated and within DNA, [64][65][66][67] should not play a significant role in "isolated" (i.e., not involved in I-motifs) [CH•C] + pair.
We do not expect that the "monomer-like" deactivation routes are hindered in I-motifs, also in the presence of the phospho(deoxy)ribose backbone.Indeed, the structural distortions leading to the "ethylenic" crossing regions (both for CH + and C) are quite small and mainly involve the C5-H group, which is exposed to the solvent.As a consequence, it is likely that the peculiar photophysical behavior, with long-living components, revealed by TR experiments on nucleic acid sequences folded in I-motif, 15,50 is due to excited states involving stacked bases.

ACKNOWLEDGMENTS
RI thanks financial support from CNR (progetti@cnr/ UCATG4 and Nutrage) and from CN3, National Center for GeneTherapy and Drugs based on RNA technology, funded by the European Union-NextGenerationEU-PNRR.L.M.F.thanks the CCC-UAM for generous allocation of computing time and the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with Universidad Autónoma de Madrid in the line Support to Young Researchers, in the context of the V PRICIT (Regional Programme of Research and Technological Innovation) (SI3/PJI/2021-00331).

F I G U R E 2
Schematic drawing of (A) CH + •4 H 2 O, we used to model protonated cytidine in water.(B) [CH•C] + , we used to model hemiprotonated cytidine dimer.Atom numbering is also shown, labeling with a prime the atoms of CH + .

F I G U R E 3
Computed ground state infrared (IR) spectrum for CH + •4 D 2 O. ε in cm −1 mol −1 ; Frequency in cm −1 .Spectrum computed at the PCM/M052X/6-31G(d) level.Each stretching frequency has been scaled by 0.96 and broadened with a Lorentzian with half-width half maximum = 10 cm −1 .The experimental spectrum is shown in fig.S8 of the SI of Keane et al.

F I G U R E 4
Schematic drawing of (A) the minimum of the π H π* L optimized CH + •4 D 2 O, (B) a representative structure of the "ethylenic" crossing region between π H π* L and S 0 .F I G U R E 5 Schematic drawing of the potential energy surface associated to the out-of-plane motion of the C′5H group leading to the ethylenic crossing region.F I G U R E 6 Schematic drawing of the "ethylenic" crossing region between π H π* L and S 0 located at the CASSCF level.

T A B L E 2
Vertical excitation energies VEE (eV), oscillator strengths (in parentheses) of the first 6 excited states for [CH•C] + , calculated with different functionals and basis sets in water.PCM/M052X/6-31G(d) geometry optimizations.

F I G U R E 7
Natural transition orbitals associated with the three lowest energy excited electronic states of [CH•C] + .
Vertical absorption energies VAE (eV), oscillator strengths (in parentheses) of the first 5 excited states for CH + •4 D 2 O and C•4 D2O in water, calculated with M052X with the different basis sets.
T A B L E 1