Single‐molecule quantification of 5‐methylcytosine and 5‐hydroxymethylcytosine in cancer genome

Epigenetic modifications present a great influence on tumorigenesis, and have attracted a growing body of researchers’ interests. They deal with genomic information to regulate biological pathways related to tumorigenesis and cancer development, without changing the primary base sequence of DNA. As the most important epigenetic modifications, 5‐methylcytosine (5mC) and 5‐hydroxmethylcytosine (5hmC) are the ones most widely investigated in human cancers. To expand our understanding of the roles of 5mC and 5hmC in cancers, multiple methods have been developed to detect and quantify 5mC and 5hmC. While various detection methods having been reviewed, here we focus on ultrasensitive single‐molecule techniques used in quantifying 5mC and 5hmC, including nanopore sequencing, single‐molecule real‐time sequencing, optical mapping, and single‐molecule fluorescent imaging.


INTRODUCTION
Epigenetics, first designated by Waddington to describe "the branch of biology which studies the causal interactions between genes and their products, which brings the phenotype into being," 1 plays a crucial role in development and pathogenesis, including carcinogenesis. 2 One of the most important epigenetic modifications is DNA 5-methylcytosine (5mC). 5mC, known as "the fifth base", is generated by DNA methyltransferases (DNMTs). Specifically, DNMTs add a methyl group to the fifth position of the cytosine base in DNA. 3 5mC is mostly found in cytosine-phospho-guanine (CpG) dinucleotides site in the mammalian genome, 4 and functionally results in gene silencing or gene expression decreasing. 5 While 5-hydroxymethylcytosine (5hmC) is a hydroxylated and methylated form of cytosine, and is also an oxidation product of 5mC catalyzed by the dioxygenase ten-eleven translocation (TET) family. 6 Since it's discovered in mammalian cells in 2009, 5hmC has been referred as "the sixth base," and has received increasing attention in recent years. 7 According to recent studies, it is generally believed that 5hmC is the first step in an active DNA demethylation process, 8 which results in releasing gene repression by DNA methylation, and then gene activation or gene expression increasing. 9 Cancer is a heterogeneous disease that is driven by genetic abnormalities, including altered DNA sequence and gene expression. 10 Consequently, the awareness that epigenetic modification can directly regulate gene expression, it is becoming increasingly evident that epigenetics plays a vital role in tumorigenesis and cancer development. As mentioned before, 5mC and 5hmC can lead to gene silencing and gene activation, respectively, hypermethylation of tumor-suppressor genes or hypomethylation of oncogenes also has been identified in most types of cancer, such as Rb in Retinoblastoma, BRCA1 in breast cancer, and Myc in liver cancer. 11 Besides, in tumors of the brain, lungs, livers, kidneys, colon, skin, breasts, and prostate it also shows a significant reduction of 5hmC levels. 12,13,14 Thus, quantitation analysis of 5mC and 5hmC in cancer genome has remarkable clinical significance.
Increased recognition of the importance of 5mC and 5hmC in cancer pathogenesis has catalyzed the development of techniques that is used to detect and quantify them. In the last few decades, liquid chromatography coupled with mass spectrometry (LC-MS/MS) technology and immunoquantification tools have been used as the two main methods for 5mC and 5hmC quantification. 12,15 However, applications of these methods are limited since they require ample DNA to sample. So sensitive and specific single-molecule technologies, such as nanopore sequencing, single-molecule, real-time sequencing (SMRT), optical mapping, and single-molecule fluorescent imaging, have attracted scientists' attention. As the next-generation sequencing technologies, nanopore sequencing and SMRT provides not only DNA sequence information, but also 5mC and 5hmC quantification information, according to current features and fluorescence features, respecvtively. Meanwhile, optical mapping and single-molecule fluorescent imaging can directly offer the total numbers of 5mC or 5hmC with little requirement for data processing. Also, optical mapping can process long range sequence of a large intact single-molecule DNA and range over hundreds of Kbps. Single-molecule fluorescent imaging can provide distance-based relationship of 5mC and 5hmC by analyzing fluorescence resonance energy transfer, which is significant for analyzing gene regulation. Herein, we will review these single-molecule 5mC and 5hmC quantitative methods.

NANOPORE SEQUENCING
As a promising next-generation sequencing technique, nanopore sequencing provides the possibility of singlemolecule detection without the needs for sample labeling or amplification. 16 In 1996, this nanopore-based technique was first mentioned by Deamer and colleagues. 17 They hypothesized that ssDNA and RNA molecules could be electrophoretically driven across a protein nanopore, and then detected and sequenced via the resulting ionic current through this pore ( Figure 1A). Their work confirmed that the electric field could drive ssDNA molecules through an -hemolysin ( HL) protein nanopore. However, it failed to perform DNA sequencing, due to its impossibility to reach the single-base resolution with this method. 18 There are two major factors that restricted the resolution of such a method, the length of the nanopore channel and the stay time of a single base in the nanopore. Specifically, the channels of these nanopores are longer than ∼5 nm, which makes the recorded current generated from at least 10-15 nucleotides rather than a single base. 19 Moreover, ssDNA translocation speed is so fast that the time of each base staying is not enough to be identified. 20 To overcome these challenges, two major approaches for nanopore sequencing have been proposed.
The first approach is exonuclease sequencing. Keller et al first realized that single-molecule DNA sequencing could be performed by detecting the single nucleotide released by an exonuclease from the end of a DNA strand. 21 Then, Bayley and colleagues found that with the help of the HL nanopore equipped with a -cyclodextrin adapter, dNTPs can be identified and quantified referring to ionic current changes. 22 Finally, with optimized operating conditions and nanopore systems, exonuclease sequencing can be further used to identify and quantify 5mC in the presence of A, T, G, and C. 23 Based on the previous work, Oxford Nanopore Technologies were developed to detect 5mC and 5hmC. 24 The other approach is strand sequencing, in which scientists reduced both the DNA translocation speed passing through the nanopore and the nanopore length to enable a single base sensing. To reduce the speed of ssDNA passing through the nanopore, Wallace and colleagues immobilized the ssDNA within the HL nanopore through a streptavidinbiotin bridge. By doing this, they successfully sequenced such intact DNA strand and directly discriminated A, T, G, C, 5mC, and 5hmC in this ssDNA. 25 While Laszlo et al and Manrao et al used a mutated form of the Mycobacterium smegmatis porin A (MspA) as the protein nanopore to get a shorter nanopore channel. 26 Compared with HL, MspA has a shorter and narrower constriction (∼1.2 nm wide and 0.6 nm long), 27 which makes it influenced by fewer nucleotides during the detection. Further, they used a phi29 DNA polymerase (DNAP) to draw ssDNA through the nanopore that can control the translocation of ssDNA in single-nucleotide steps. Later, Laszlo further confirmed that MspA-based nanopore sequencing also can identify 5mC and 5hmC in singlemolecule DNA. Zeng and colleagues detected 5mC or 5hmC by attaching a host-guest complex to 5mC or 5hmC with HL nanopore sequencing. 28 Other strategies or nanomaterials to improve nanopore sequencing had already been developed or in development. Additionally, statistical analysis of current signature events recorded by nanopore sequencing can provide quantitative information of 5mC and 5hmC in DNA.

SINGLE-MOLECULE, REAL-TIME SEQUENCING
In 2009, single-molecule, real-time sequencing (SMRT), one of the next-generation sequencing technologies, was also developed for DNA sequencing. 29 This method uses Once the DNA molecules are driven to through a series of gradually smaller nanoparticles (grey circles), and then stretched within channels of the nanofluidic device, a fluorocode containing dCTP and 5hmC quantification information can be obtained using a fluorescent microscope. (D) Single-molecule fluorescent imaging experimental scheme. The DNA molecule is end-labeled with biotin, 5hmC is labeled with Cy3 (green), and 5mC is labeled with Cy5 (red). Then the labeled DNA molecules are immobilized to slide due to the reaction between biotin and neutravidin. By using a TIRF microscope, the 5hmC and 5mC quantification data can be obtained by counting the fluorophore with green laser and red laser, respectively polymerase to incorporate phospho-linked nucleotides into DNA with zero-mode waveguide nanostructure arrays, 30 which allows continuous detection of thousands of single-molecule sequencing reactions without steric hindrance. In SMRT sequencing, nucleotides are labeled with distinct fluorophores. In an incorporation event, a detectable pulse of increased fluorescence in the corresponding color channel is produced by the fluorescence emitting, providing sequence determination of the complementary DNA template. In addition, these fluorescence pulses can also provide valuable information about DNA kinetics, which is sensitive to DNA primary and secondary structures that can be influenced by different DNA methylations. 31 So, in SMRT sequencing ( Figure 1B), different kinetics information represents different DNA methylations. As a result, DNA methylations, including 5mC and 5hmC, can be sensitively and quantitatively detected by comparing kinetic signatures. 32 Although this method can provide long-read length, it shows a low throughput. Considering this, Song et al further improved this method by combining it with the selective chemical labeling of 5hmC to first provide a high-throughput 5hmC quantification method. By labeling with azide-glucose and biotin, 5hmC can be enriched by streptavidin beads, which can reduce the sample amount sequencing requires. Moreover, the increase in 5hmC size by chemical labeling produces large kinetic signals, leading to higher confidence assignments of 5hmC during SMRT sequencing. 33 In 2015, single-molecule, real-time bisulfite sequencing (SMRT-BS) was developed by combining SMRT with bisulfite conversion to remove limitations of traditional bisulfite sequencing, achieving long-read length and abolishing the inability to multiplex distinct amplicons caused by PCR amplicon cloning. 34 With this technique, it is possible to measure CpG methylation across multiplexed ∼1.5 kb amplicons.

OPTICAL MAPPING
Optical mapping is a single-molecule DNA profiling method based on the DNA barcode. The concept of this method originated from fluorescence in situ hybridization (FISH), which has an extremely low resolution because of the condensed state of the DNA in chromosomes. 35 Until the 1990s, researchers realized this and tried different strategies to extend the DNA, such as "molecular combing." 36,37 Consequently, optical mapping has two important elements, fluorescent labeling of specific sequence or dNTP and stretched DNA. 37,38 In 2010, Neely et al used a DNA methyltransferase, M. Hha I, to directly labeled sequence 5 ′ -GCGC-3 ′ with Atto647N dye, and then an evaporating droplet was used to stretch it onto a PMMA-coated surface. Thus, the quantification information of sequence 5 ′ -GCGC-3 ′ was obtained by analyzing the fluorocode. 39 After tagging 5mC with fluorescently labeled methyl-CpG-binding domain proteins (MBD), Lim et al detected it by stretching the prepared DNA sample with nano/microfluidic channels integrated by nanogrooves, 250 × 200 nm 2 in size, and a shunt layer 50 nm in depth. 40 Later, Lam and colleagues utilized the 45-mm silicon nanochannels on a nanofluidic device to uniformly extend DNA and keep it in an elongated, linearized state. 41 Based on Lam's technology, Michaeli and colleagues extended its utility for 5hmC mapping ( Figure 1C). 42 First, they attached an azido-modified glucose to 5hmC by using -glucosyltransferase ( -GT). Next, an alkyne-modified Alexa-Fluor was labeled on a 5hmC site via a "click chemistry" reaction with the functional azide. With the help of fluorescent microscopy, 5hmC can be quantified by this method. Gilat et al and Gabrieli et al have utilized this method to obtain the 5hmC profiles of blood and colon cancer and human peripheral blood mononuclear cells (PBMCs). 43

SINGLE-MOLECULE FLUORESCENT IMAGING WITH TIRF MICROSCOPE
In order to detect and quantify 5hmC, Song and colleagues first took advantage of Terminal Transferase (TdT) and modified dCTP to end-label dsDNA with biotin and Cy3. Then, with the help of -GT, they selectively labeled 5hmC with Cy5. 44 After that, the biotinylated and dye-labeled dsDNA sample was first immobilized to the neutravidin tethered microscope slide surface due to the specific recognition between biotin and neutravidin, and then analyzed by single-molecule fluorescent imaging. 45 By using a total internal reflection fluorescence (TIRF) microscope, the quantification data of 5hmC can be obtained by counting the red fluorophores (Cy5). To simultaneously image and quantify the 5mC and 5hmC in the same DNA molecule, a duallabeling strategy was developed. 46 In this approach, before obtaining the total numbers of 5mC and 5hmC by counting the red fluorophores and green fluorophores, respectively, Song prepared the DNA sample through three steps. First, the end of the DNA molecule was labeled with biotin. Second, 5hmC in the DNA was blocked and labeled with Cy3 by using the -GT. Third, 5mC was oxidized to 5hmC with the help of TET, which offered a reaction site to -GT. Thereby, Cy5 was allowed to attach to the 5mC site via the Huisgen cycloaddition reaction ( Figure 1D). Both 5mC and 5hmC play important roles in carcinogenesis, so except for their total numbers, the interplay information between them is also extremely critical. It can also be obtained in this technology by analyzing the colocalization status between them via singlemolecule Förster resonance energy transfer (smFRET) value.

CONCLUSION AND OUTLOOK
An increasing number of studies suggested that both 5mC and 5hmC have remarkable influences on cancer development, 47 which prompted us to propose that both of them are emerging biomarkers for cancer diagnosis, therapy, and prognosis. So, developing advanced techniques to detect them and even quantify them is quite significant. To the best of our knowledge, although 5mC is the most common epigenetic modification in the human genome, it is still very scarce, only 3-6%, and 5hmC is even less, nearly 1-10% of 5mC. 14,48 Thus, traditional quantitative technologies used for 5mC and 5hmC quantification is restricted as they require a large amount of samples due to their extremely low concentration in samples, however, at times, the amounts of the sample we can usually get is extremely limited, such as cell-free DNA in cancer patients' blood or urine. Therefore, single-molecule quantitative approaches are developed. These methods are characterized by high sensitivity and enable us to obtain precise statistical information about 5mC and 5hmC using only trace amounts of samples.
Herein, we reviewed four technologies with distinct features for single-molecule quantification of 5mC and 5hmC. Theoretically, they can be classified into two classes: opticalbased techniques and electrical-based technique, which means they quantify the 5mC or 5hmC by detecting the value of optics or electrics. With the development of these optical or electrical technologies, the field of the 5mC and 5hmC quantification has been advanced a huge step. However, new strategies to further improve the 5mC and 5hmC quantitative technologies are still on demand. One strategy is to develop combined techniques for 5mC and 5hmC quantifications. For instance, different 5mC and 5hmC quantification techniques can be combined together and work to their advantages, thus develop more specific and sensitive techniques, such as a single-molecule electro-optical nanopore sensing tool. 49 Furthermore, traditional quantitative technologies will be advanced by combining with single-molecule methods. Considerable improvements are still needed in order to advance and innovate the field of single-molecule DNA modifications quantification, and the revolution for this field is not yet over.

CONFLICT OF INTEREST
The authors declare no conflict of interest.