Squaramides and Ureas: A Flexible Approach to Polymerase‐Compatible Nucleic Acid Assembly

Abstract Joining oligonucleotides together (ligation) is a powerful means of retrieving information from the nanoscale. To recover this information, the linkages created must be compatible with polymerases. However, enzymatic ligation is restrictive and current chemical ligation methods lack flexibility. Herein, a versatile ligation platform based on the formation of urea and squaramide artificial backbones from minimally modified 3′‐ and 5′‐amino oligonucleotides is described. One‐pot ligation gives a urea linkage with excellent read‐through speed, or a squaramide linkage that is read‐through under selective conditions. The squaramide linkage can be broken and reformed on demand, while stable pre‐activated precursor oligonucleotides expand the scope of the ligation reaction to reagent‐free, mild conditions. The utility of our system is demonstrated by replacing the enzymatically biased RNA‐to‐DNA reverse transcription step of RT‐qPCR with a rapid nucleic‐acid‐template‐dependent DNA chemical ligation system, that allows direct RNA detection.

Our inspiration came from the excellent read-through fidelity of an artificial amide backbone, [10] and the various nucleic acid backbone analogues that have been reported for RNAt herapeutic applications.S tructurally related carbamate, [24][25][26] thiourea, [27] urea, [26] and squaramides [28] have been introduced into oligonucleotides by iterative coupling or as phosphoramidite dinucleotides,b ut neither their formation by templated ligation nor their read-through fidelity have been explored. Interestingly,all of these backbones could be derived from unmodified or minimally modified oligonucleotides where the terminal 5'-o r3 '-hydroxy groups are replaced by amino groups.F urthermore,t he reagents (for example,1,1'-carbonyldiimidazole (CDI) and squarate ester) used to generate these linkages become progressively less reactive upon each nucleophilic addition-elimination step, [29] potentially allowing oligonucleotides to be pre-activated prior to ligation with another strand.

Results and Discussion
To investigate the feasibility of our ligation strategy, oligonucleotides bearing asingle terminal 3'-amino dT and 5'amino dT were prepared using conventional solid-phase oligonucleotide synthetic methods and commercial resins and monomers,which provided the modified oligonucleotides in good yields and purity. [30,31] Next, "one-pot" ligation of appropriate pairs of oligonucleotides was performed using ac omplementary templating DNAs trand (splint) at pH 8.5. Under these conditions,t he amine-ammonium ion equilibrium is shifted favourably to the amino form, whilst the templating strand significantly enhances the second nucleophilic addition-elimination step by reducing its concentration dependence.Unfortunately,noevidence for the formation of carbamate or thiourea linkages was observed using CDI or its thio-derivative (Supporting Information, Figure S1). On the other hand, squaramide-and urea-linked oligonucleotide ligation proceeded well using squarate ester and CDI, respectively (71 %a nd 46 %, respectively,F igure 2B). The difference in CDI reactivity for the carbamate and urea is likely due to the lower nucleophilicity of the hydroxy group required for carbamate formation (compare with the amino groups for the urea linkage). Fort he squaramide,l igation efficiency was dependent on pH (best to worst:8 .5 > 7.5 > unbuffered, Supporting Information, Figure S2) and squarate methyl ester concentration (optimal = 5mm). Fort he urea, CDI was used as asolid in large excess,making optimisation of the urea ligation conditions challenging as the imidazole by-product naturally buffers the system to neutral pH.
Mass spectrometry of the crude squaramide formation reaction mixture showed that residual amino oligonucleotides were quantitatively converted to mono-squaramide monoesters.T herefore,p re-activation of either the 3'-o r5 '-amino oligonucleotides was investigated before templated ligation to the second strand. Using identical conditions to the "one-pot" ligation reaction and simple desalting with no further purification, pre-activation was straightforward, no amino oligonucleotide dimerisation was observed and near-quantitative activation was achieved even at very high (100 mm) oligonucleotide concentrations ( Figure 2C). When added to the appropriate amino oligonucleotides,g ood conversion to the squaramide was observed within 15 min if at emplating . B) One-pot ligation, where all necessary oligonucleotides including splint are mixed together and squarate ester or CDI are added to generate squaramide and urea artificial backbones, respectively.The 3'-amino oligonucleotide was labelled with a5 '-FAM, which was used for denaturing PAGE visualisation and quantification of coupling by ImageJ analysis. Note coupling = f product / f total 100, where f = fraction. C) High-concentration (100 mm)p re-activation of 3'-and 5'-amino oligonucleotides using squarate ester to generate mono-squaramides. Mass spectrometryd emonstrates complete activation and no dimerisation. m/z (5'-FAM-3'-amino F1) = 3819 (expected), 3819 (found); m/z (5'-FAM-3'-mono-squaramide F1) = 3929 (expected), 3929 (found); m/z (5'-amino Am3) = 9305 (expected), 9305 (found); m/z (5'-mono-squaramide Am3) = 9415 (expected), 9415 (found). D) Ligation when pre-activatedm ono-squaramide oligonucleotides are mixed with the appropriate amino oligonucleotide and splint. The gel was visualised and quantified as described for (B). E) The importanceo fatemplating strand (splint) to bring the reactants together for squaramidechemical ligation when apreactivated mono-squaramide oligonucleotide is used. The gel is visualised by UV shadowing. Note coupling = (1Àf unreactedSM /f total SM ) 100, where f = fraction and SM = starting material. The oligonucleotides used for (B) to (D) are listed in Table S1 and for (E) in Table S4 in the SupportingInformation.Afull summary of squaramidel igation efficiency can be found in Table S10 in the Supporting Information.
strand was present ( Figure 2E). The5 '-amino pre-activated oligonucleotides gave slightly lower ligation efficiency( 80 % vs.8 8% for the pre-activated 3'-amino oligonucleotide, Figure 2D,afull summary of squaramide ligation efficiency can be found in Table S10 in the Supporting Information), most likely due to the 3'-amino group being amore sterically hindered nucleophile.
Pleasingly these pre-activated oligonucleotides were stable to hindered amines such as tris(hydroxymethyl)aminomethane (Tris) that is used in buffers,but could be inactivated in the presence of primary amines,s uch as ethanolamine (Supporting Information, Figure S3). Fora ssembly of larger DNAc onstructs,i tw ould be necessary to have oligonucleotides with both terminal 3'-a nd 5'-amino groups.P reactivation in this case should avoid cyclic oligonucleotide formation but maintain activation of both terminal amino groups,which was indeed observed (Supporting Information, Figure S4). Surprisingly RNAoligonucleotides required higher squarate ester concentrations for complete pre-activation (approximately 10 mm squarate ester,S upporting Information, Figure S4). Assembly of large nucleic acids would require annealing of multiple oligonucleotides,thus requiring the mono-squaramide monoesters to have reasonable temperature stability.Whenheated for 5min at 95 8 8Cinstandard Taqp olymerase buffer (10 mm Tr is-HCl, 50 mm KCl, and 1.5 mm MgCl 2 ,p H8.3), 9% and 18 %h ydrolysis of the ester was observed for 5'-a nd 3'-activated oligonucleotides,r espectively (Supporting Information, Figure S5). However, decreasing the temperature (40 min, 55 8 8C, Supporting Information, Figure S5) reduces the hydrolysis rates to negligible levels.
Having established ar obust ligation platform, the next focus was information retrieval. qPCR was performed using Phusion (exo + ,e xo = exonuclease activity) and Taq( exo-) polymerases,t he premise being that read-through of the modified backbone to generate the unmodified complementary strand is rate-determining on the amount of PCR product generated. Hence detection of PCR product at lower cycles is indicative of higher read-through efficiency.
Both polymerases tolerated the urea linkage well, with PCR product fluorescence reaching ad etectable level after anumber of cycles (cycle threshold, C t )comparable to that of the canonical phosphodiester linkage ( Figure 3A). Conversely,t hese polymerases were less tolerant of the squaramide linkage ( Figure 3A, DC t control-squaramide % 10 (Phusion) and % 14 (Taq) cycles,w hich is equivalent to approximately 1000 and 10 000-fold less PCR product, respectively assuming 100 %P CR efficiency). Optimisation of the PCR buffer to remove monovalent cations improved the situation but the amount of product was still significantly lower than the control (DC t control-squaramide % 10 (Taq), Supporting Information, Figure S6). Changing the polymerase enzyme, however, was the most effective solution. Remarkably,V ent (exo-) gave comparable C t values for control and squaramide backbones with no alteration to the buffer supplied by the manufacturer ( Figure 3A). To corroborate these results, linear copying of the modified templates and analysis by gel-electrophoresis was performed using Vent (exo-) polymerase,w ith DNAt emplates containing previously report-ed [10] amide and triazole backbones providing points of reference ( Figure 3B). Read-through of the amide and urea backbones was comparable,w hich is consistent with their similar structural demands.S quaramide and triazole backbone read-through was also comparable for Vent (exo-) polymerase,a nd demonstrates the importance of the polymerase;t he triazole linkage has been extensively tested and shown to function with arange of polymerases (for example, Taqa nd Phusion), [10] whereas the squaramide linkage is selective,w ith read-through only being efficient for Vent (exo-) polymerase.
Linear extension also allowed mass spectrometry characterisation of the read-through products ( Figure 3C). Fort he unmodified control, the expected mass of the full-length read- Figure 3. PCR and linear copying of artificial backbone-containing oligonucleotides. A) qPCR curves for phosphodiester (control), urea, and squaramideb ackbones (colour-coded) using hot-start Taq, hotstart flex Phusion or Vent (exo-) polymerase. Urea backbones are tolerated well by all tested polymerases, while the squaramidebackbone is selectively read-through using Vent (exo-) polymerase. B) Linear copying of templates containing artificial backbones using Vent (exo-) polymerase for 1hat 60 8 8Cbyd enaturing PAGE. Previously reported triazole and amide backbones are shown for reference. C) Mass spectrometry analysis of the products from read-though of squaramide-and urea-containing oligonucleotides by Vent (exo-) polymerase (2 hat608 8C). m/z (read-through product) = 25696 (expected), 25696 (found). Note that due to alack of proof-readingactivity, the polymerase can add an untemplated dA base (see phosphodiester control). Shoulder peaks are the result of salt adducts from the reaction buffer.D )Representative Sanger sequencing results for Vent (exo-) PCR amplicons cloned into avector.Urea and squaramide linkages are represented by circle and square symbols, respectively. Full alignments can be found in Figure S8 in the SupportingInformation. The oligonucleotides used are listed in Tables S4-S7 in the SupportingInformation.
through product was observed in addition to ap eak for the product plus an additional dA nucleotide.T his is consistent with the known tendency of polymerases lacking exonuclease (proof-reading) activity to add an untemplated terminal dA nucleotide.F or the squaramide template,t he same products as the control were observed in addition to aproduct missing one dA nucleotide relative to the full-length product, the exact location of which was probed by Sanger sequencing (see below). Fort he urea linkage,o nly correct full-length and "full-length plus dA"p roducts were observed. Interestingly, the choice of polymerase is again important;i nc ontrast to accurate read-through with Vent (exo-) polymerase,K lenow (exo +)p olymerase generates an equal mixture of products lacking one or two dA nucleotides for the urea backbone, with negligible full-length product (Supporting Information, Figure S7).
To verify the exact location of the dA insertions and deletions,P CR amplicons were generated using Vent (exo-) polymerase and synthetic squaramide-or urea-containing templates,c loned into av ector, and transformed into Escherichia coli. Several colonies were randomly picked (n = 10 and 8f or squaramide and urea, respectively) and the recovered vectors sent for Sanger sequencing ( Figure 3D,full alignments in Figure S8 in the Supporting Information). All colonies,b ar one,d isplayed correct base specificity adjacent to the artificial linkage,c onfirming the extra dA nucleotides are the result of the polymerasest endencyt oa dd an untemplated terminal dA nucleotide.T he one exception was ad Td eletion directly adjacent to the site of the urea linkage,arare occurrence (1/8).
Theu rea linkage also displayed four other single base mismatches but these were non-conserved and occurred greater than 10 bases away from the modification site.F or the squaramide,o nly one C-to-T mismatch mutation was observed 7bases away from the modification, suggesting the dA deletion seen by mass spectrometry could be aterminal 3' base deletion.
Squaramide-containing oligonucleotides have been reported to be susceptible to hydrolysis during ammoniamediated nucleobase deprotection. [28] This raises the possibility of excising the squaramide linkage and regenerating the oligonucleotide starting materials under appropriate conditions.T oevaluate this,arange of nucleophiles (methylamine, ethanolamine,e thylenediamine,c ysteamine,a nd dithiothreitol) were screened for their ability to break the squaramide linkage in 1h at 55 8 8C( Supporting Information, Figure S9). Thiols were ineffective,h owever primary amines showed appreciable squaramide backbone cleavage.E thylenediamine was the most effective reagent (Figure 4). This is likely due to the initial products formed upon cleavage;nucleophilic addition-elimination of the amine on the squaramide-ligated oligonucleotide generates one strand with afree amino group and another strand with the remaining squaramide-ethylenediamine adduct ( Figure 4A). Intramolecular attack of the free amino group of ethylenediamine is likely to facilitate the second nucleophilic addition-elimination reaction to release the squaramide from the oligonucleotide.T he identity of the oligonucleotide adducts was confirmed by UPLC-MS (Figure 4C). Following the reaction over time indicated that the rate determining step to recover the starting material was removal of the squaramide-ethylenediamine adduct from the 3'-amino terminus of the oligonucleotide.Optimised cleavage conditions involved the addition of 50 %aqueous ethylenediamine to an equal volume of aqueous oligonucleotide sample. A) The various products generated upon treatment of squaramide-containingoligonucleotides with ethylenediamine. Conversiono fthe 3'-squaramide-ethylenediamine adduct to the free 3'-amine is the rate-determining step (r.d.s.) to regenerate the 5'-amino and 3'-amino oligonucleotides for re-ligation. B) An example of squaramidec leavage (1:3 v/v EDA to oligonucleotide in water,3hat 55 8 8C) and re-ligation (squarate ester in 15 mm sodium borate buffer,pH8.5 containing 0.2 m NaCl). Each reaction was analysed by UPLC-MS with the UPLC traces shown here. Relative amounts of the squaramide-containingo ligonucleotide were determined by integration of the traces, which were all normalised to the splint peak intensity (grey). C) Time-dependent cleavage of squaramide-containing oligonucleotides using 1:3v/v of ethylenediamine to oligonucleotide in water at 55 8 8C. Both UPLC traces and mass spectrometryd econvolution of these peaks are shown to highlight that the removal of the 3'-squaramide-EDA adduct is rate limiting in obtaining the free amino oligonucleotides. UPLC peaks were integrated to determine relative conversion of the squaramide-adduct oligonucleotide to the free amino oligonucleotides. Note that all traces are normalised to the splint peak intensity (grey). Formass spectrometry, the ratio of deconvoluted masses is shown to estimate squaramide-adduct amounts. This assumes ionisation of the free amino oligonucleotide and squaramide-ethylenediamine adduct is comparablea nd dominatedb y the negative charge of the oligonucleotide. m/z (3'-squaramide-EDA adduct) = 3957 (expected), 3957 (found); m/z (5'-squaramide-EDA adduct) = 6405 (expected), 6406 (found). The oligonucleotides used for (B) are listed in Supplementary Table 2and

Research Articles
Heating for 3hat 55 8 8Cgave approximately 95 %cleavage of the linkage ( Figure 4B). Confirmation that the starting material had been recovered was demonstrated by desalting the sample by gel-filtration, which also removes ethylenediamine,f reeze-drying,a nd re-ligating the oligonucleotides using one-pot ligation conditions ( Figure 4B). Pleasingly, ac omparable coupling yield to that observed during the ligation optimisation ( Figure 2B)was observed.
Finally,t od emonstrate the utility of our enzyme-free ligation platform, we focussed on RNAd etection. RT-qPCR is the current gold-standard for RNAd etection but the methodology has several limitations:1 )the priming strategy used for RNA-to-cDNAc onversion (for example,r andom hexamers vs.p oly(dT) primers) is known to influence the results obtained; [21][22][23] 2) genomic DNAc ontamination remains an issue even with DNase-treated RNAs amples; [32] 3) the identity of the PCR product needs to be validated during or post-PCR (typically using an oligonucleotide that hybridises to the PCR product). We envisaged al igationdependent qPCR design ( Figure 5A)that could address these points.O ur design requires two DNAo ligonucleotides to hybridise next to each other in buffered, reagent-free conditions to form the template that will be used for qPCR. This validates the identity of the target RNAp rior to PCR directly from the unmodified source sample.M ore importantly,n op olymerase-priming strategy is required as the RNA-to-DNAc onversion relies solely on hybridisation; enzymatic bias in template read-through and/or secondary structure interference are therefore minimised. [33] Since the DNAo ligonucleotides will be directly used for qPCR, the introduction of "tails" (sequences) that are not found naturally in the genome enables the design of primers that are highly specific to the DNAtemplate.Similar in concept to Padlock probes, [34,35] the key differentiator is that squaramide chemical ligation enables efficient ligation irrespective of nucleic acid substrate,a llowing RNAd etection that is otherwise challenging with ligase enzymes. [36] In designing our pair of RNA-detecting DNAo ligonucleotides ( Figure 5A), we split each oligonucleotide into a20mer region that hybridises to the RNAtarget and a10-mer 5'tail for PCR primer binding.T he two 20-mer regions were designed to minimise self-complementarity,w hich would otherwise lead to template-independent ligation, and to be highly specific to the target as determined by BLAST [37] alignment against ar eference database.T he 10-mer sequences were designed to be non-complementary to the target, and provide ar elatively unstructured tail on which PCR primers can efficiently bind to for amplification, the latter being determined by mfold. [38] As ap roof of concept, we initially designed DNA oligonucleotides that target a1 00-mer RNA( Figure 5A and Supporting Information, Table S9). The3 '-amino oligonucleotide was pre-activated with squarate ester and kept as astock solution, while the 5'-amino DNAoligonucleotide was left un-activated. Thet wo oligonucleotides were then mixed with the RNAt emplate (3 mm to 300 pm,f ive-orders of magnitude) and the ligation allowed to proceed for 15 min before analysis of the product by qPCR. Promisingly,specific RNA-target amplification was observed, with non-specific primer-only amplification limiting the sensitivity of the system ( Figure 5B).
Mixing the scrambled control RNAw ith the specific target RNAh ad no impact on target RNAd etection (Supporting Information, Figure S10 A). With these results in hand, we sought to push the limits of the system. To reduce primer-only amplification, heat-activated dNTPs were used due to the lack of ac ommercial heat-activated Vent (exo-) polymerase (Supporting Information, Figure S10 B). Furthermore,amore complex system was investigated composed of a2 664-mer target RNAp repared through in vitro transcription that was spiked into total RNAe xtracted from MCF-7 cells.The target RNAwas titrated into the total RNA, enabling the determination of the dynamic range and the detection limits of the system (0.3 mm to 30 pm,five-orders of magnitude,F igure 5C). Pleasingly,t he qPCR efficiency (110 %, R 2 = 0.995) was within the generally accepted criteria of 95-110 %and gave comparable results to RT-qPCR of the same samples ( Figure 5C). RNAd etection was ultimately Figure 5. RNA detection through templated squaramidel igationa nd qPCR. A) The sequenceo fthe oligonucleotides used for (B) and the overall design principle of the chemical-ligation detection system. Two DNA oligonucleotides, one apre-activated3'-mono-squaramide and the other afree 5'-amino group, must hybridise adjacent to each other on the target RNA in order to initiate ligation. These DNA oligonucleotides contain 10-base "tails" that distinguish the resulting template from genomic DNA. Primers that hybridise to the resultant template then amplify the sequence for quantitative PCR detectionu sing the dye EvaGreen. The key benefits are direct validationo fthe target sequence before PCR (two oligonucleotides of approximately 20 bases must hybridise adjacent to each other for ligation and therefore RNA detection) and the lack of enzyme (polymeraseorligase)/ primer bias in RNA-to-DNAc onversion. B) The squaramidel igation system is responsive only to the target RNA sequenceand not the scrambled control. The limitation being primer-only amplification. C) The system performs comparably to RT-qPCR over the same dynamic range (0.3 mm to 30 pm,f ive-orders of magnitude) for a2 664-mer target RNA spiked into 100 ng total RNA isolated from MCF-7 cells. Through the use of heat-activated dNTPs, the limitation of detection is untemplated squaramideligation (namely,noR NA template, see Figure S10 Bi nthe SupportingInformation for negligible primer-only amplification). The sequencesu sed for (B) and (C) are in Table S9 in the Supporting Information.
limited by untemplated squaramide ligation (namely,ligation in the absence of the RNAt arget, see Figure S10 Bi nt he Supporting Information for negligible primer-only amplification). This could potentially be addressed by decreasing the amino oligonucleotide concentration in the ligation reaction; however, sufficient oligonucleotide must be present to ensure the equilibrium favours hybridisation to low concentration RNAt argets.C hemical modifications that enhance RNA target affinity (for example,LNA)could offer the solution to this problem.

Conclusion
Our squaramide and urea ligation platforms expand the scope of polymerase-compatible nucleic acid assembly and crucially provide greater flexibility in this process than previously reported methods.T hrough the introduction of commercially available 5'-a nd 3'-amino groups,o ne-pot ligation can generate two different linkages:1)aurea linkage that multiple polymerases can read-through with excellent speed, or 2) as quaramide linkage whose formation can be reversed and offers accurate read-through under selective conditions.For the latter, the linkage can also be formed from stable pre-activated intermediates in mild buffered conditions in the absence of small-molecule reagents.T he diversity of this platform lends itself to many applications including DNA-encoded libraries, [2] aptamer proximity-ligation [39] and DNAnano-construct assembly. [40] Here we have demonstrated its use in RNAdetection, where it performs comparably to RT-qPCR while offering advantages,such as removing RNA polymerase bias, [21][22][23] faster speed of execution, and greater inherent target specificity.