Molecular structure and evolution mechanism of two populations of double minutes in human colorectal cancer cells

Abstract Gene amplification chiefly manifests as homogeneously stained regions (HSRs) or double minutes (DMs) in cytogenetically and extrachromosomal DNA (ecDNA) in molecular genetics. Evidence suggests that gene amplification is becoming a hotspot for cancer research, which may be a new treatment strategy for cancer. DMs usually carry oncogenes or chemoresistant genes that are associated with cancer progression, occurrence and prognosis. Defining the molecular structure of DMs will facilitate understanding of the molecular mechanism of tumorigenesis. In this study, we re‐identified the origin and integral sequence of DMs in human colorectal adenocarcinoma cell line NCI‐H716 by genetic mapping and sequencing strategy, employing high‐resolution array‐based comparative genomic hybridization, high‐throughput sequencing, multiplex‐fluorescence in situ hybridization and chromosome walking techniques. We identified two distinct populations of DMs in NCI‐H716, confirming their heterogeneity in cancer cells, and managed to construct their molecular structure, which were not investigated before. Research evidence of amplicons distribution in two different populations of DMs suggested that a multi‐step evolutionary model could fit the module of DM genesis better in NCI‐H716 cell line. In conclusion, our data implicated that DMs play a very important role in cancer progression and further investigation is necessary to uncover the role of the DMs.


| INTRODUC TI ON
Gene amplification, as a common feature of genomic instability in many tumours, is highly associated with tumorigenesis and chemoresistance. Cytogenetically, gene amplification chiefly manifests as homogeneously stained regions (HSRs) or double minutes (DMs). 1,2 In 2017, the term extrachromosomal DNA (ecDNA) was first proposed for extrachromosomal chromatins at molecular level. 3 No matter cytogenetic DMs and molecular ecDNA, they usually carry amplification of oncogenes or drug-resistant genes in cancer. DMs and ecDNA were regarded as diagnostic markers in clinical practice to monitor cancer progression, occurrence, and prognosis. [4][5][6] A recent survey of a compendium of cancer cells and cell lines in Glioblastoma Multiforme (GBM) provides direct evidence that extrachromosomal amplification of oncogenic elements enhances genomic diversity during cancer evolution. 3 The farther research showed how ecDNA elements could mark major clonal expansions in otherwise-stable genomic backgrounds and related ecDNA presence to cancer progression and also pointed out that 'whether ecDNA size and structure affect the mechanism of tumorigenesis is unclear and is another reflection of the lack of knowledge of extrachromosomal DNA, in particular as an understudied domain in cancer'. 7 However, possible mechanisms and contributions of DMs/ ecDNA for cancer progression are still unknown.
Currently, it has been documented that extrachromosomal chromatins arose from circularization of co-amplified DNA fragments from multiple chromosomal loci. 8,9 In earlier studies, researchers have attempted to measure the complex genomic rearrangements of the amplicons in cancer by array comparative genomic hybridization (CGH), fluorescence in situ hybridization (FISH), quantitative PCR and more advanced techniques such as whole genome sequencing. [10][11][12] The rejoining patterns of the amplicons which constitute DMs have been revealed accordingly. [13][14][15] Gibaud's study supported that break-fusionbridge cycles and/or chromosome fragmentation, which mainly involved non-homologous end joining (NHEJ) to induce the reconnection of chromosomal breakage, may drive the complex structures of intrachromosomal amplifications. 14 Another study by Gibaud's team found that the formation of DMs may be accomplished through V(D)J-like illegitimate recombination. 15 Stolazzi and colleagues confirmed the episome model, suggesting that DMs were formed by amplification, excision and fusion junctions mediated by NHEJ. 13 These data suggested that concept of genesis and structure of DMs are controversy and multiple models have been proposed according to their observations. Recently, combinational usage of whole genome sequencing and FISH analysis has led us into the primary understanding of DM structures in glioblastoma multiforme, 16 solid tumours or cancer cell lines. 17,18 Analysing tool is being developed, and it would get more precise resolution of extrachromosomal chromatins from high-throughput sequencing data. It would permit a deeper understanding of the scale, scope and contents of extrachromosomal chromatins in cancer and their association with clinical features. 6 Because of the diversity of extrachromosomal chromatins in size, amount, molecular composition and organization in single or multiple cells, 10 it is important to identify the molecular architecture of extrachromosomal chromatins. In ecDNA study, Paul's team used three cell lines and constructed partial molecular architecture at genome level. 19 Nevertheless, due to the heterogeneity of DMs/ ecDNA, without co-localization analyses of the different amplified regions at the cellular level (Multiplex-fluorescence in situ hybridization, M-FISH), the reconstructed DMs/ecDNA structures may not be accurate or complete. More important, according to the research, heterogeneity of extrachromosomal chromatins was dynamic during drug resistance development and cell proliferation. Resistance to doxorubicin (DOX) and methotrexate (MTX) in human osteosarcoma cells was a multigenic process involving both gene copy number and expression changes. 20 In addition to changes in the copy number of the gene amplification, its form of existence may also change.
Analysing of amplification in GBM39 cells, the ecDNA reintegrated as HSRs after erlotinib treatment. 3 On removal of the treatment, the ecDNA amplicons re-emerged. 21 Through these studies, it helps us to consider that accelerated heterogeneity of cancer cells through DMs/ecDNA may increase the likelihood of tumorigenesis and chemoresistance.
The human colorectal adenocarcinoma cell line NCI-H716 carries DMs, and the molecular structure of them has not been explicitly defined. 10 In the present study, we employed new techniques to re-identify the subpopulations in order to visualize DM formation and their molecular structure in NCI-H716 cells. As a result, we built a framework for characterizing the molecular structure of DMs with integrated analyses including M-FISH, high-resolution array CGH, NimbleGen capture array, PacBio RS DNA system, Illumina HiSeq X Ten platform and chromosome walking. Our data implicated that both NHEJ/ microhomology-mediated end joining (MMEJ) and fork stalling and template switching (FoSTeS)/ microhomology-mediated break-induced replication (MMBIR) mechanisms, in a multi-step evolutionary way, illustrated the rearrangements during DM formation in NCI-H716 cells.

| Cell culture
The human colorectal cancer cell line NCI-H716 with spontaneous DMs was purchased from American Type Culture Collection (ATCC, VA, USA). The cells were cultured in RPMI-1640 medium (Invitrogen, CA, USA) supplemented with 10% foetal bovine serum (Invitrogen) K E Y W O R D S colorectal adenocarcinoma, double minutes, evolution mechanism, extrachromosomal DNA, gene amplification, molecular structure in a humidified atmosphere of 5% CO 2 at 37°C. Cells were authenticated by short tandem repeat profiling analysis (Beijing Microread Genetics, Beijing, China).

| M-FISH analysis
Metaphase chromosomes of the NCI-H716 cells were prepared as previously described. 22 The slides were either stained with Giemsa

| High-resolution array CGH
DNA processing, microarray handling and data analysis were

| DNA Hybrid Capture and PacBio RS sequencing
Genomic coordinates of DMs were provided by high-resolution array-CGH analysis. Customized NimbleGen 2.1M sequence capture arrays were fabricated using SeqCap v2 software. The captured library preparation and NimbleGen Sequence Capture Arrays were performed according to the manufacturer's protocol, which was adapted from the company's application notes (Roche NimbleGen, WI, USA). The captured DNAs were generated by ligating PacBio SMRTbell™ adapters to both ends of linear DNA fragments. These DNA fragments were sequenced on the PacBio RS as a continuous circle. Then, clear data were mapped onto reference human genome (GRCh37/hg19) using Burrows-Wheeler Aligner (BWA) to filter out the reads without SVs. Subsequently, all the identified fusion sequences were also verified with the BLAST-Like Alignment Tool (BLAT) and Basic Local Alignment Search Tool (BLAST).

| Whole genome sequencing
Genomic DNA of the NCI-H716 cells was extracted, quantified and purified with HiSeq X Ten protocol (Illumina, CA, USA). DNA fragments were ligated with adaptor oligonucleotides to form pairedend DNA libraries with an insert nucleotide of 500 bp. This library was amplified by PCR with adaptor-specific primers and sequenced by Illumina HiSeq X Ten instrument to obtain up to ∼150 million reads (Novogene, Beijing, China). Reads that aligned to genomic regions were collected for mutation identification and subsequent analysis. Samtools mpileup and bcftools were used to do variant calling and identify single nucleotide polymorphisms (SNPs), insertions and deletions (INDELs). Control-FREEC was utilized to do copy number variations (CNVs) detection. And BreakDancer was applied to detect SV information.

| Chromosome walking
Sequences of Junction VII and Junction VIII were acquired using GenomeWalker universal kit and Advantage 2 PCR Kit (Clontech Laboratories Inc, CA, USA) according to the manufacturer's instructions. Normal human DNA was used as control. All the junction sequences were validated by Sanger sequencing.

| DMs in NCI-H716 cells were comprised of multiple highly amplified regions from chromosome 8q24.12-21 and 10q26.13
In order to clarify the origin of DMs, we first used metaphase chromosome analysis to confirm the existence of DMs, and observed large amounts of DMs in NCI-H716 cells. Then, to obtain the amplified regions, an Agilent 2 × 400 K human genome CGH microarray was applied to examine the CNV. We identified four major amplified regions originating from 8q24.12-21 and 10q26.13 in NCI-H716 cells. The amplicons were named as Amplicon H1, H2, H3 (from centromere to telomere) on chromosome 8q24.12-21 and Amplicon H4 on 10q26.13 ( Figure S1). Based on CGH microarray results, we designed a customized high-resolution microarray (mean distance ≈ 200 bp in the amplified regions). Through detailed analysis of microarray results, we found that amplicons H1, H2, H3 and F I G U R E 1 Four amplified regions were determined in NCI-H716 cells with an Agilent array-CGH chip with high-density probes. The yellow strip in A, red strip in B, purple strip in C and blue strip in D represented each corresponding amplification region-H1, H2, H3 and H4. The majority of the copy numbers were around 2 6 , shown with single line. The majority of the copy numbers were around 2 7 , shown with double lines. X-axis represented chromosome coordinates. Y-axis represented log 2 ratios of the copy number normalized by normal controls, showing distinct sub-regions with different overall copy numbers. Blue vertical lines depict boundary positions for each amplicon. The position of the BACs was marked with black strip H4 were composed of several sub-amplicons depending on different amplification levels. These sub-amplicons were therefore named Amplicon H1a, H1b, H2a, H2b, H2c, H2d, H3a, H3b, H3c, H4a and H4b ( Figure 1). In the four major amplicons, the majority of the copy numbers were as high as 2 6 . However, a dramatically higher amplification level was observed in sub-amplicons (H1b, H2a, H2c, and H3b), twice of the major amplicons ( Figure 1). Notably, a high density for probing was required for accurate detecting of subtle structure variations. It might be the reason that why we had not been able to capture the subtle changes in DNA copy number in the amplification regions of NCI-H716 cells in previous studies. These results suggested that DMs in NCI-H716 cells containing highly amplified

| DMs were heterogeneously organized defined by localization of amplicons in NCI-H716 cells
Since DMs in the cells contained sequences from different chromosome fragments, we speculate that the way in which DMs arranged might be crucial to cell evolution. To this end, we chose six available BACs based on amplification level within these amplicons, as representative FISH probes to localize the sub-amplicons in the cells. The amplified regions on DMs were verified by M-FISH analyses, and co-localization of the four amplified DNA fragments was shown in Combining with microarray data, we deduced that two populations of DMs in each accounted for around 50% in NCI-H716 cells.
Furthermore, we observed duplicated signals of H1a, H2d and H3c located within one individual DMs from Population one, indicating a more complex molecular structure of DMs similar to a 'diploid' structure ( Figure 3). However, these characteristic structures were not observed in DM Population two. We also analysed more than 200 metaphases and found that the fluorescence signals for each probe were specifically hybridized to DMs and no signal was detected in HSRs.
To investigate the heterogeneity of DMs in NCI-H716 cells, we

| Fine sequential characterization discovered the junctions in DMs
Because of heterogeneity of DMs, we hypothesized that break-rejoin model might be a possible mechanism to induce the occurrence of DMs. To elucidate the mechanism, we designed a NimbleGen capture array to detect the sequence of DMs. First, DMs DNA  Table S3 in sequential order (Figure 1). Among these breakpoints, the se-  (Table S4).  H1b, H2a, H2c, H3b,   H4a, and H4b formed DM Population two ( Figure 6B). Therefore, the Amplicons H1b, H2a, H2c and H3b were shared fragments of both DM populations. As a result, the copy numbers of these four amplicons were much higher than those of the others. These results were consist with the microarray results indicating that these four amplification regions (Amplicon H1b, H2a, H2c and H3b) had higher copy numbers (log 2 ratios ≈ 7) than those in the other amplification regions (Amplicon H1a, H2b, H2d, H3a, H3c, H4a and H4b, log 2 ratios ≈ 6). Much detailed depiction about the biological feature and utility of DMs is still needed to be clarified.

| D ISCUSS I ON
Our study employed the high-throughput sequencing technique, providing an unprecedented platform to detect DMs in cancer cells. [16][17][18]24 However, this technique has limitations in current study. Like, the sequence data generated by PacBio RS and Illumina HiSeq X Ten Sequencing system only presented partial junction sequences of Junction VII and Junction VIII. These incomplete information led to misinterpretation that the rearrangement breakpoints were connected to chromosomal segments (ie insertion fragments in junction VII), and thus, the highly amplified regions were inserted into chromosomes to form HSR, which was inconsistent with the M-FISH-based results. Therefore, in order to obtain the complete junction sequences, we replenished the sequencing data with chromosome walking and long-distance PCR, which successfully filled in In this study, we aimed to reveal the mechanisms of DMs formation by deciphering the junction events. As a result, we found three 'junctions' ( Figure 6A) in DM Population one, and six 'junctions' (Figure 6B) in DM Population two with Junction IV shared by both populations. In total, nine recombination sequences (Junction V has four recombination breakpoints) were found to mediate these Junctions, excluding Junction VII and Junction VIII. Of the nine junctions forming Junction I to VI, eight were recombinated by MMEJ and one by blunt end joining, which are two major forms of NHEJ pathway. 26,27 It was consistent with the DMs described in human ovarian cancer cell line UACC-1598. 28 NHEJ, as a simple recombination-based mechanism, can explain some nonrecurring rearrangement. 26,29 However, Junction VII and VIII showed more complexity  (H1b, H2a, H2c and H3b). Second, more powerful evidence was from the formation of Junction IV (green bar), which does not exist on chromosomes, shared by both populations (Figure 7). Finally, 34 SNPs of chromosome 8 we identified were present in heterozygous with 10 copies vs. 1,500 copies (Table S2). These SNPs were present in homozygosis in two populations of DMs. It suggested that these amplicons co-existed in two populations derived from the same chromosome 8. Thus, we speculate that subpopulations of DMs might be formed by rearrangements in a stepwise manner. That is, in the early stage of DM formation, a first catastrophic event, chromothripsis 37,38,41 for instance, occurred in chromosome 8 and aroused fragmentation and replicative repair within the chromosome. Most chromosomal fragments are eliminated extracellularly during the cell cycle, while the retained fragments may be further recombined and circularized to form DM Population one. The generation of Population two might arise from a second catastrophic event, which integrated some specific sequences from Population one with the fragments of chromosome 10. Through replication, unequal segregation of DMs and selection for growth advantage, the cancer cells carried numerous specific DMs. Thus, we speculate that subpopulations of DMs might be formed by rearrangements in a stepwise manner, rather than with independent evolution or one-off genomic catastrophe.
In summary, we re-identified two populations of DMs in human colorectal adenocarcinoma cell line NCI-H716, confirming their heterogeneity in cancer cells and managed to construct their molecular structure, which were not investigated before. Based on our analysis, we propose that both NHEJ/MMEJ and FoSTeS/MMBIR pathways may mediate the rearrangements in DMs, and the complex structure of DMs in NCI-H716 cell line may be generated by multi-step evolutionary process involving various mechanisms. Further anatomy on DMs will enhance our understanding the biological significance of extrachromosomal chromatins in cancer.

ACK N OWLED G EM ENTS
We would like to thank Dr Li Jin (Ministry of Education Key

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests. Writing -review and editing (equal).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data from this study have been submitted to the NCBI Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) under accession number SRP127522 and Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE11 0071.