A Water‐Soluble Tetraazaperopyrene Dye as Strong G‐Quadruplex DNA Binder

Abstract The interactions of the water‐soluble tetraazaperopyrene dye 1 with ct‐DNA, duplex‐[(dAdT)12 ⋅(dAdT)12], duplex‐[(dGdC)12 ⋅(dGdC)12] as well as with two G‐quadruplex‐forming sequences, namely the human telomeric 22AG and the promotor sequence c‐myc, were investigated by means of UV/visible and fluorescence spectroscopy, isothermal titration calorimetry (ITC) and molecular docking studies. Dye 1 exhibits a high affinity for G‐quadruplex structures over duplex DNA structures. Furthermore, the ligand shows promising G‐quadruplex discrimination, with an affinity towards c‐myc of 2×107  m −1 (i.e., K d=50 nm), which is higher than for 22AG (4×106  m −1). The ITC data reveal that compound 1 interacts with c‐myc in a stoichiometric ratio of 1:1 but also indicate the presence of two identical lower affinity secondary binding sites per quadruplex. In 22AG, there are two high affinity binding sites per quadruplex, that is, one on each side, with a further four weaker binding sites. For both quadruplex structures, the high affinity interactions between compound 1 and the quadruplex‐forming nucleic acid structures are weakly endothermic. Molecular docking studies suggest an end‐stacking binding mode for compound 1 interacting with quadruplex structures, and a higher affinity for the parallel conformation of c‐myc than for the mixed‐hybrid conformation of 22AG. In addition, docking studies also suggest that the reduced affinity for duplex DNA structures is due to the non‐viability of an intercalative binding mode.

The binding equilibrium is most conveniently defined in terms of concentrations of binding sites, rather than in terms of concentrations of DNA basepairs.
For a binding site size of n basepairs, i.e. one ligand binds to n basepairs, the total concentration of binding sites in solution is given by: The quadratic equation is solved for [L] bound in the usual manner for quadratic equations. Inspection of the two solutions shows that only  The potential binding models were explored by fitting a model involving ligand aggregation, as quantified by the independently determined K agg and ΔH agg , as well as the two different DNA-ligand binding events to the calorimetric data. For the two different binding events, K A , ΔH agg , n A , K B , ΔH agg and n B were all optimised without restrictions using the simulated annealing protocols incorporated in I2CITC. Following fitting, the simulated annealing trajectories were analysed to obtain plots of the normalised sum over square deviations divided by degrees of freedom (Σdev 2 /dof) as a function of the stoichiometries n A and n B with all other parameters as shown in Scheme 1 having their optimised values for each particular ombination of n A and n B . (Figures S3A-C). Figures S3A-C thus show the lowest value of Σdev 2 /dof for each combination of n A and n B . Combinations of parameter values for which normalised Σdev 2 /dof is less than 2 should be considered within error margins. For further details, see reference [1] . There is covariance of the stoichiometries n A and n B . Nevertheless, the best fit is observed for a stoichiometry n A of two and n B of four, suggesting a molecule of 1 binding on each side of the quadruplex in the tightest binding mode, with a weaker binding mode involving multiple molecules of 1. No parameters were restricted in the fitting.
SI-11 C c-myc The value of n A is well defined as approximately 1. The parameter n B is calorimetrically ill-defined with values up to at least 5 within error margins. n B was therefore restricted to a value of two, which, together with the well-defined stoichiometry n A of one, gives a total stoichiometry TAPP : c-myc of 3:1, in agreement with the results from the spectroscopic titrations.

Figure S4: Fits and evaluation of error margins and parameter covariance
The error margins on the optimised parameters for calorimetric data for 1 binding to the various nucleic acid structures were evaluated in a manner analogous to the exploration of the binding models above. Following fitting, the simulated annealing trajectories were analysed to obtain plots of the normalised sum over square deviations divided by degrees of freedom (Σdev 2 /dof) as a function of each parameter, with all other optimisable parameters having their optimised values for each particular value of the parameter of interest ( Figures S4A-D). Figures S4A-D thus show the lowest value of Σdev 2 /dof for value of a particular parameter. Parameter values for which normalised Σdev 2 /dof is less than 2 should be considered within error margins. Figure S4.1 illustrates how error margins are determined for the binding site size n A for the strongest binding mode of 1 with ct-DNA.  For further details, see reference [1] .

S1 Docking studies
Structures were obtained from the nucleic acids database [2][3] using the sequence definitions as in Table S1.

Selection of 22AG structures
Using the sequence definition above for 22AG, 16 structures were found, of which 6 structures contained more bases than the minimum sequence. These structures were ignored. Of the remaining 10 structures, 2 were nucleic acid structures without bound ligands (PDB 1KF1 and 143D) and 8 were nucleic acid structures with bound ligands (PDB 2MCO, 2MCC, 3UYH, 4FXM, 4G0F, 3R6R, 3SC8 and 3T5E). To include, to an extent, the effect of the flexibility of the 22AG structure, docking studies were carried out using all 10 reported relevant structures as targets. In addition to the reported parallel and antiparallel structures for unmodified 22AG, we also used the mixed-hybrid structure 2E4I for a modified 22AG sequence as a target in our docking studies to reflect the fact that 22AG is expected to adopt this mixed-hybrid conformation under our experimental conditions.

Selection of c-myc structures
For c-myc, 2 structures were found (PDB 2L7V and 1XAV) and both were used as targets.

Pre-docking treatment of quadruplex structures
Where the structure includes multiple models, only the first model was used as a target for docking. Where the structure contained a ligand and/or water molecules, these were removed using UCSF Chimera. Ligands and/or water molecules were removed by selecting the ligand and/or water molecules, inverting the selection, and saving the resulting selection as a PDB file. Where a structure contained 3 potassium ions, with one potassium ion on the outside of the stack of three tetrads, this cation was removed.

TAPP 1 PDBQT file
The generation of the PDBQT file for TAPP 1 using AutoDockTools resulted in a predominantly rigid structure, with only the bonds indicated in red in Scheme S1 being rotatable.

S2 Comparison of interaction parameters from spectroscopic and calorimetric studies duplex-[(dGdC) 12 ⦁(dGdC) 12 ]
Following data fitting, the simulated annealing trajectories were analysed to obtain plots of the normalised sum over square deviations divided by degrees of freedom (Σdev 2 /dof) as a function of the equilibrium constants K A and K B and as a function of the equilibrium constant K A and the interaction enthalpy ΔH A ( Figure  S5).   Figure S5 shows that K A and K B are correlated in the fit to the calorimetric data and that the combination of optimised values (K A = 1.2×10 5 and K B = 0.98×10 5 ) corresponds to a lower range of the possible values for K A and K B . The optimised value for K A is in fact low in combination with a very large optimised value for ΔH A . This is a not infrequent outcome for data showing a Σdev 2 /dof as a function of K A and ΔH A as in Figure S5 (Right) where very low values of K A tend to correlate with compensating high values of ΔH A .
We explored the link between the equilibria identified from the calorimetric data and the single apparent equilibrium quantified using the spectroscopic data. To this effect, we simulated the concentrations of the various species present in solution during the spectroscopic titrations according to combinations of values of K a , n A , K B and n B identified as reasonable in the analysis of the calorimetric data. These combinations have been identified by the red dots in Figure S5. We then fitted the multiple independent binding sites model to the concentration of the free ligand because we analysed a disappearing spectroscopic signal in our data analysis and the decreases of this signal is dominated by the removal of the free ligand from solution during the titration. The results of this procedure are summarised in Table S2. SI-20  Table S2 shows that the apparent stoichiometry of the interaction that would be obtained from the simulated spectroscopic data is typically between 2.5 and 3.0, in good agreement with the observations from the spectroscopic titrations. In addition, it is clear that for higher values of K A the apparent affinity constant increases to the same order of magnitude as the value observed from the spectroscopic titrations. If we also take the spectroscopic data into account, it is therefore very likely that the values for the affinity constants of 1 for GC are rather higher than the values obtained from the analysis of the calorimetric data alone. We favour values for K A around 5×10 6 M -1 and values for K B of around 5×10 5 M -1 . These values are further associated with a ΔH A of +0.7 kcal mol -1 and a ΔH B of -0.5 kcal mol -1 . Both these values fit well with the enthalpy changes observed for the interactions of 1 with the other nucleic acid structures.

22AG
As for duplex-[(dGdC) 12 ⦁(dGdC) 12 ], combinations of reasonable values of K A and K B were identified ( Figure   S6).  For three reasonable combinations of K A and K B (red dots in Figure S6), we simulated the concentrations of the various species present in solution during the spectroscopic titrations. We then fitted the multiple independent binding sites model to the concentration of the free ligand because we analysed a disappearing spectroscopic signal in our data analysis and the decreases of this signal is dominated by the removal of the free ligand from solution during the titration. The results of this procedure are summarised in Table S3.