Sequencing 5‐Hydroxymethyluracil at Single‐Base Resolution

Abstract 5‐hydroxymethyluracil (5hmU) is formed through oxidation of thymine both enzymatically and non‐enzymatically in various biological systems. Although 5hmU has been reported to affect biological processes such as protein–DNA interactions, the consequences of 5hmU formation in genomes have not been yet fully explored. Herein, we report a method to sequence 5hmU at single‐base resolution. We employ chemical oxidation to transform 5hmU to 5‐formyluracil (5fU), followed by the polymerase extension to induce T‐to‐C base changes owing to the inherent ability of 5fU to form 5fU:G base pairing. In combination with the Illumina next generation sequencing technology, we developed polymerase chain reaction (PCR) conditions to amplify the T‐to‐C base changes and demonstrate the method in three different synthetic oligonucleotide models as well as part of the genome of a 5hmU‐rich eukaryotic pathogen. Our method has the potential capability to map 5hmU in genomic DNA and thus will contribute to promote the understanding of this modified base.

Abstract: 5-hydroxymethyluracil (5hmU) is formed through oxidation of thymine both enzymatically and non-enzymatically in various biological systems.A lthough 5hmU has been reported to affect biological processes such as protein-DNA interactions,the consequences of 5hmU formation in genomes have not been yet fully explored. Herein, we report amethod to sequence 5hmU at single-base resolution. We employ chemical oxidation to transform 5hmU to 5-formyluracil (5fU), followed by the polymerase extension to induce T-to-C base changes owingt ot he inherent ability of 5fUt of orm 5fU:G base pairing. In combination with the Illumina next generation sequencing technology,w ed eveloped polymerase chain reaction (PCR) conditions to amplify the T-to-C base changes and demonstrate the method in three different synthetic oligonucleotide models as well as part of the genome of a5 hmU-rich eukaryotic pathogen. Our method has the potential capability to map 5hmU in genomic DNAa nd thus will contribute to promote the understanding of this modified base.

DNA-base modifications can profoundly influence biology
and an umber of modified bases have been identified in the genomes of av ariety of organisms. [1] 5-hydroxymethyluracil (5hmU) is produced through oxidation of thymine both enzymatically and non-enzymatically [2] and can influence the binding of proteins to DNA. [2b,3] It has also been suggested that 5hmU might lead to genomic instability as it can be removed by DNAr epair enzymes to create potentially mutagenic lesions [4] and affect the stability of DNA duplexes. [5] When incorporated at some promoter sites, 5hmU has been shown to affect transcription by bacterial RNAP,therefore it may have asignificant effect on microbial biology. [6] In mammals,reported levels of 5hmU vary by celland tissue types. [2b, 7] Increased levels of 5hmU autoantibodies have been reported in cancer cases, [8] and blood 5hmU mononucleoside levels have been studied as am arker of cancer risks and invasiveness. [9] We previously reported am ethod to map 5hmU at moderate resolution by chemical enrichment of 5hmU-containing DNAfragments followed by sequencing. [10] Amethod for single-base sequencing of 5hmU would enable the identification of individual modification sites in genomes.Asingle-molecule real-time (SMRT) sequencing approach could in principle be applied to map 5hmU at single-base resolution, however the intrinsic signature signal for 5hmU is rather weak unless the base is further modified. [11] Mapping 5hmU at single-base resolution is aw orthy challenge that could transform genome-wide analysis of 5hmU.H erein, we describe ac hemical approach for single base-resolution sequencing of 5hmU and demonstrate its utility in various sequence contexts.
Thec onceptual basis for sequencing 5hmU involves chemical oxidation of 5hmU to 5fU, which ionizes under mild alkaline pH owing to the electron-withdrawing exocyclic aldehyde (pK a at N3 = 8.1 for 5fUv s. 9.3 for 5hmU). [12] The ionized form of 5fUcan base-pair with G( Figure 1), causing aT -to-C base change that marks the original 5hmU sites.The oxidation of 5hmU to 5fUw as carried out using KRuO 4 . [13] TheT -to-C change is then established during ap olymerasedependent single extension, which is subsequently amplified by PCR. To rule out T-to-C changes that arise for reasons other than 5hmU (for example,p re-existing mutations,n on-5hmU DNAd amage), sequencing is compared to a" nooxidation" control in which there has been no conversion of 5hmU to 5fU.
As proof of concept, we first employed as ynthetic oligonucleotide with two 5hmUs at defined positions (ODN1), and the base readout profiles at each 5hmU site as well as proximal non-modified thymine sites were quantified (Table 1a nd the Supporting Information, Tables S1 and  S2). Sequencing was performed on an Illumina next generation sequencing (NGS) platform (a schematic summary of the sequencing data analysis is shown in the Supporting Information, Figure S1). Under optimised conditions,t he proportion of total sequencing reads generating a"C" signal, at the 5hmU-modified sites was high ("%C" = 39 %and 30 %, Table 1and Table S1) compared to unmodified Tsites (1.4 %, Table 1; Wilcoxon rank-sum test, p-value = 0.003 for both 5hmU sites). In the control experiment in which no oxidation of the DNAwas carried out (that is,5hmU is not converted to 5fU) the proportion of sequencing reads exhibiting a" C" signal at 5hmU-modified sites were low (2.2 and 2.7 %, Table 1) and comparable to the levels of unmodified T( that is,1.2 %, Table 1) (Wilcoxon rank-sum test, p-value > 0.01, for both 5hmU sites). Thus,i ndividual 5hmU sites could be resolved from unmodified Tand detected by sequencing. We found that the T-to-C percentage change depends on the concentration of dATP during single-extension PCR. A5 00-fold decrease in concentration of dATP compared to other nucleotide triphosphates was optimal (Table 1and Table S1).
Thes trength of the T-to-C signal change for 5fU depended on the choice of polymerase (Table S2), and we chose Bst DNAP olymerase,Large Fragment (obtained from New England Biolabs) for further study owing to its capability to induce T-to-C signal change at 5hmU sites without introducing noise at unmodified sites.Whenusing aseparate ODN model modified with varying levels of 5hmU at two defined positions (0-26 %, ODN2), the strength of the "C" signal at 5hmU (percent Cc ounts over the sum of Ca nd T counts,% C/(C + T)) increased linearly with the level of 5hmU (Figure S2 a). At asequencing coverage depth of 100 in ODN2, 5hmU was detectable down to an incorporation level of 15 %(fold-change of "C" signal compared to the nooxidation control, Figure S2b,data available at https://github. com/sblab-bioinformatics/5hmUseq).
To investigate potential sequence context bias in our approach, we prepared the oligonucleotide model ODN3, with ar andomised base flanking each side of as ingle 5hmU site,t herefore representing the 16 possible trinucleotide sequence contexts (N 1 -5hmU-N 2 ,N 1 and N 2 = Ao rTor G or C). While we observed some context-dependent variability in the %C/(C + T) signal at the 5hmU-modified sites,t he calling of 5hmU relative to no-oxidation control was clear and unambiguous in all cases (Figure 2), indicating that the method is suitable for detecting 5hmU in all trinucleotide sequence contexts.
We then applied the method to map 5hmU in the genome of the eukaryotic pathogen Trypanosoma brucei (Figure 3). [14] We mapped 5hmU on chromosome 2, and observed 161 Ts with significant 5hmU signal (0.02 %o fa ll Ts on the chromosome), as defined using aF DR threshold (against "no-oxidation" control) < 0.1. [15,16] As determined using as imulated random distribution, these sites showed significant (p-value = 0.0019) overlap with 5hmU regions obtained using our previously reported chemical enrichment strategy (for details see Supporting Information). [10,13,16] In conclusion, we have demonstrated ac hemical method to detect and sequence 5hmU at single-base resolution. We further envisage the method could be extended to detect 5fU by removing the oxidation step and normalising T-to-C conversion relative to conditions insensitive to 5fUs uch as libraries prepared without the single-extension step.  Protocol [a] base %T [b] %C [b] %other [b] Steps

Experimental Section
Sequencing experiments were carried out on aMi-Seq instrument using Miseq reagent kit v3 (Illumina).Quantification of 5hmU by LC-MS 2 analysis was carried out on aQExactive Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Scientific) equipped with ananospray ionization source,coupled to an UltimateRSLCnano LC system (Dionex). Detailed experimentalp rotocols and commercial sources of reagents are included in the Supporting Information (PDF). All sequencingdata have been deposited in the ArrayExpress database at EMBL-EBI (https://www.ebi.ac.uk/arrayexpress/) under accession number E-MTAB-6456. All the code developed for the data analysis has been released in the manuscriptsG itHub page (https:// github.com/sblab-bioinformatics/5hmUseq).