Highly precise breakpoint detection of chromosome balanced translocation in chronic myelogenous leukaemia: Case series

Abstract Chronic myelogenous leukaemia (CML) has a special phenomenon of chromosome translocation, which is called Philadelphia chromosome translocation. However, the detailed connection of this structure is troublesome and expensive to be identified. Low‐coverage whole genome sequencing (LCWGS) could not only detect the previously unknown chromosomal translocation, but also provide the breakpoint candidate small region (with an accuracy of ±200 bases). Importantly, the sequencing cost of LCWGS is about US$300. Then, with the Sanger DNA sequencing, the precise breakpoint can be determined at a single base level. In our project, with LCWGS, BCR and ABL1 are successfully identified to be disrupted in three CML patients (at chr22:23,632,356 and chr9:133,590,450; chr22:23,633,748 and chr9:133,635,781; chr22: 23,631,831 and chr9:133,598,513, respectively). Due to the reconnection after chromosome breakage, classical fusion gene (BCR::ABL1) was found in bone marrow and peripheral blood. The precise breakpoints were helpful to investigate the pathogenic mechanism of CML and could better guide the classification of CML subtypes. This LCWGS method is universal and can be used to detect all diseases related to chromosome variation, such as solid tumours, liquid tumours and birth defects.

breakpoints in the BCR gene in exon 12-16 main break region, the resulting fusion gene protein was p210. (2) The rare BCR breakpoint occurred in the region of exon 17-20, resulting in a p230 fusion protein. (3) In rare patients, the BCR breakpoint occurred in the rare zone of exons 1-2, resulting in the fusion protein p190. 6 The p190, p210 and p230 had persistently enhanced tyrosine kinase (TK) activity which disturbed downstream signalling pathways, causing enhanced proliferation, differentiation arrest and resistance to cell death. 7,8 The most effective drug for treating Philadelphia chromosomal disease was tyrosine kinase inhibitors (TKIs) targeting the BCR::ABL1 fusion gene protein. The biggest obstacle to improving the prognosis of patients with Ph-positive CML was drug resistance and new mutations producing from disease progression. [9][10][11] Comprehensive and accurate detection of mutations in CML patients (especially BCR::ABL1 kinase domain) in treatment progress may be the key to solving these problems. 12 The higher accuracy of the breakpoints, the more conducive to our subsequent further analysis. LCWGS has been reported as a highly accurate, cost-effective and robust detection approach to detect all abnormal chromosome structures. 2 In our study, we used LCWGS to characterize the breakpoints in three CML patients with Philadelphia chromosome. We successfully mapped the breakpoints, which disrupted two known genes, BCR and ABL1. For breakpoints in patient 1 (chr22:23,632,356 and chr9:133,590,450) and patient 2 (chr22:23,633,748 and chr9:133,635,781), the fusions of chr22 and chr9 located in the 13th intron of BCR and first intron of ABL1, respectively. For patient 3 (chr22: 23,631,831 and chr9:133,598,513), the fusions located on the 14th intron of BCR and the first intron of ABL1. In addition, we also found other chromosomal structural variations. Roughly, there is no difference in the main gene fusion of different CML patients. However, at a more refined level, they will have different breakpoints and show different clinical symptoms. [13][14][15] These have important guiding significance for the precise medication of patients and for doctors formulating follow-up treatment plans. More importantly, this technology could detect relevant mutations to screen out the patients with early myeloid leukaemia, so that the doctors and patients could carry out active and effective intervention and treatment. [16][17][18] 2 | MATERIAL S AND ME THODS

| Case selection and sample collection
We recruited three CML patients (patient 1: a 75-year-old man, patient 2: a 9-year-old girl and patient 3: a 12-year-old boy) and all applied the LCWGS method. All the patients had signed the informed consent and this study was approved by the Ethics Committee of the Peking University Shenzhen Hospital. The peripheral blood of patient 1 (heparin tube) was collected for karyotyping. Additionally, the bone marrow samples and peripheral blood (EDTA tube) samples were collected for genomic DNA (gDNA) extraction after anonymization, respectively.

| Karyotyping
For the analysis of chromosome, Giemsa (GTG) band karyotyping at 550-band level was performed in accordance with the standard laboratory protocol. The bone marrow library was sequenced on the Illumina NovaSeq with 151-bp paired-end reads and a target mean coverage of >8 folds. After removing reads containing sequencing adapters and low-quality reads, the SOAPaligner sequence alignment software (http://soap.genom ics.org.cn/) was used for mapping reads to the NCBI human reference genome (version:

| LCWGS
GRCh37.1). Then, we retained the uniquely mapped reads for the subsequent analysis and the specific analysis method has been previously described in detail. Using this specific analysis method, we could take advantage of uniquely paired reads to find all chromosome copy number variations (CNV) and structure variations (SV), and the corresponding breakpoints on the whole genome, and the accuracy of the breakpoints could be accurate to a small region of ±200 bases.
At last, accurate verification of breakpoints was carried out by Sanger sequence. We designed primers with NCBI Primer-Blast (http://www.ncbi.nlm.nih.gov/tools/ prime r-blast/) for the 500 bp upstream and 500 bp downstream of the breakpoint region respectively. By comparing the amplified products of Sanger sequence, we could determine the precise breakpoint easily. Oligonucleotide primer pairs of the translocation were designed with Gene Runner software (version 5.0.69 Beta; Hastings Software) ( Table 1).

| PCR and sanger sequencing
With designed primers, the putative fragments were amplified through PCR with general PCR conditions. The products were sequenced on an ABI-A3130 genetic analyser.

| RE SULTS
Karyotype analysis for Patient 1's peripheral blood indicated that he was 46,XY, t(9;22)(q34;q11.2) ( Figure 1A). Due to the occurrence of balanced translocation, two fusion genes (BCR::ABL1 and ABL1::BCR) were identified. In the subsequent RT-PCR experiment, Philadelphia chromosome (Ph) (+) was confirmed to be positive with the resulting fusion gene protein p210. of breakpoints for ABL1 and BCR genes was also showed in patients 1/2/3 ( Figure 1C). The precise position of the breakpoints was confirmed through PCR and Sanger sequencing in bone marrow samples and peripheral blood (EDTA tube) samples of the three patients ( Figure 1D). As shown in Figure 1E, two accurate breakpoints of Philadelphia chromosome were the same position, A lot of laboratories were currently in the process of introducing NGS into their routine diagnostic procedures, as it is shown to be a robust, reproducible and cost-effective alternative to traditional detection methods. 27 In this study, we successfully applied LCWGS method for the detection of chromosome translocation in CML patient series, which given the candidate region of the break- Cost is the biggest factor affecting the clinical application of a new technology. LCWGS is highly cost-effective with a lower coverage-depth sequencing. In this case, ~80 million paired reads (~24Gb bases) were obtained, and the cost was about US$300 per sample for using our approach. Although the sequencing cost decreases dramatically in the last few decades, the cost for WGS is still too high considering the budget. 29 Considering screening the whole genome while remaining individual information, the per-sample sequencing reads for LCWGS is ~80 M about 8-fold coverage while long-read SMRT sequencing needs ~40-fold to find chromosome SV. 30 Additionally, in a study of CML's cell lines, more than 60-fold sequence coverage data were generated. 31 Furthermore, even if we generate the same amount of data, the cost of Pacific Biosciences (PacBio) is higher than that of Illumina's NovaSeq. 32 Next, we will continue to improve the detection accuracy and lower limit of the data abundance of the algorithm, so that it can screen out the variation types in early-stage patients and other subtypes that are newly developed during the progression of leukaemia. Finally, it will provide guidance for gene editing therapy and the combination of targeted drugs.

| CON CLUS ION
LCWGS is a cost-effective and accurate method to detect chromo-

ACK N OWLED G EM ENTS
We thank Dr. Wenyong Zhang from Southern University of Science and Technology for revising this manuscript.

CO N FLI C T O F I NTE R E S T
None.

DATA AVA I L A B I L I T Y S TAT E M E N T
The original data of this project can be easily obtained from the author by e-mail.

I N FO R M E D CO N S E NT
All patients provided written informed consent before participation.