Spectrum of gene mutations identified by targeted next‐generation sequencing in Chinese leukemia patients

Abstract Background Despite targeted sequencing have identified several mutations for leukemia, there is still a limit of mutation screening for Chinese leukemia. Here, we used targeted next‐generation sequencing for testing the mutation patterns of Chinese leukemia patients. Methods We performed targeted sequencing of 504 tumor‐related genes in 109 leukemia samples to identify single‐nucleotide variants (SNVs) and insertions and deletions (INDELs). Pathogenic variants were assessed based on the American College of Medical Genetics and Genomics (ACMG) guidelines. The functional impact of pathogenic genes was explored through gene ontology (GO), pathway analysis, and protein–protein interaction network in silico. Results We identified a total of 4,655 SNVs and 614 INDELs in 419 genes, in which PDE4DIP, NOTCH2, FANCA, BCR, and ROS1 emerged as the highly mutated genes. Of note, we were the first to demonstrate an association of PDE4DIP mutation and leukemia. Based on ACMG guidelines, 39 pathogenic and likely pathogenic mutations in 27 genes were found. GO annotation showed that the biological process including gland development, leukocyte differentiation, respiratory system development, myeloid leukocyte differentiation, mesenchymal to epithelial transition, and so on were involved. Conclusion Our study provided a map of gene mutations in Chinese patients with leukemia and gave insights into the molecular pathogenesis of leukemia.

lymphoblastic leukemia (ALL), acute promyelocytic leukemia (APL), chronic myeloid leukemia (CML), and chronic lymphocytic leukemia (CLL) are the common types of leukemia (Juliusson & Hough, 2016). Acute leukemia is composed of primary undifferentiated cells; while chronic leukemia, the malignant cells are more differentiated. Exposure to environmental radiation and solvents were reported as predisposing factors for leukemia (Schuz & Erdmann, 2016). However, the direct cause of leukemia has not been found. Recently, mutation profiling of genes provides prognostic prediction and treatment guidance for patients with leukemia (Itzykson et al., 2013;Shin et al., 2016). Reports on mutational patterns of Chinese leukemia patients are limited.
The application of next-generation sequencing (NGS) technique can better achieve testing for a larger group of mutational markers. NGS is a massively parallel high-throughput DNA sequencing approach (Metzker, 2010). The major advantages of this approach are that simultaneously screen a large number of genes and samples using very low amount of nucleic acids, and have high sensitivity for mutation detection (Meldrum, Doyle, & Tothill, 2011). Types of NGS include whole-genome sequencing, whole-exome sequencing, whole-transcriptome sequencing, and targeted regions sequencing (Ross & Cronin, 2011). Targeted regions sequencing for multiple specific genomic regions have been widely employed in many fields to identify genetic variants related to disease pathogenesis and prognosis (Mansouri et al., 2014).
Despite targeted sequencing have identified several mutations for leukemia, there is still a limit of mutation screening for Chinese leukemia. Here, we performed targeted regions sequencing containing these 504 genes in 109 patients with leukemia among Chinese Han population to explore the genetic basis of Chinese leukemia patients.

| Study population
A total of 109 patients diagnosed with leukemia at Hainan General Hospital were enrolled. All of the patients were genetically unrelated ethnic Han Chinese. Patients with leukemia were diagnosed according to 2016 WHO classification criteria (Sabattini, Bacci, Sagramoso, & Pileri, 2010). The samples were confirmed by bone marrow microscopy, flow immunophenotyping, chromosome screening, and fusion gene detection. Our study was approved by the ethical review board of Hainan General Hospital, and complied with the Declaration of Helsinki. Informed written consent was obtained from all patients enrolled in this study.

| DNA extraction and sequencing
Peripheral blood (5 ml) was collected from each subject into EDTA-coated vacutainer tubes. Genomic DNA was isolated using a commercially available DNA extraction kit (GoldMag Co. Ltd.), and then quantified using a NanoDrop 2000 Spectrophotometer (NanoDrop Technologies).
The sample prepared using a Truseq DNA Sample preparation Kit (Illumina) following the standard protocol. Agilent SureDesign website (https://earray.chem.agile nt.com/-sured esign/ home.htm) was used to design capture oligos for 504 cancer-related genes. Paired-end libraries were prepared following the Illumina protocol. Hybridization reactions were performed on AB 2720 Thermal Cycler (Life Technologies Corporation). The hybridization mixture was captured using magnetic beads (Invitrogen) and Agilent Custom Sureselect Enrichment Kit according to the manufacturer's instructions. Sequencing (2 × 150 bp reads) was carried out on Illumina HiSeq2500 platform (Illumina).

| In silico analysis
Pathogenic or likely pathogenic SNVs was assessed by the 1000 Genomes Project, Sorting Tolerant From Intolerant (SIFT), PolyPhen, MutationTaster, and Combined Annotation Dependent Depletion (CADD) based on the American College of Medical Genetics and Genomics (ACMG) guidelines (Richards et al., 2015). Pathway enrichment analysis of gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) for the candidate pathogenic genes was carried out using R clusterprofiler software package (http://bioco nduct or.org/packa ges/ relea se/bioc/html/clust erPro filer.html). GO terminology was annotated on the following three aspects: cellular component (GO-CC), molecular function (GO-MF), and biological process (GO-BP). Pathway enrichment based on KEGG pathway database (https://www.kegg.jp/kegg/) was applied for pathway annotation. GO terms and KEGG pathway with p < .05 were considered to be significant. Protein-protein interaction (PPI) network was predicted by STRING online software (https://strin g-db.org/). GEPIA (http://gepia.cance r-pku.cn/) database was used to evaluate the expression and prognostic of candidate genes in leukemia.

| Clinical characteristics of patients with leukemia
A total of 109 patients with leukemia were identified. The median age of the studied cohort was 39.83 years (range: 11-77 years), included 58 (53.2%) males and 51 (46.8%) females (Table 1). Among these, 30 patients diagnosed as AML, 19 patients as ALL, 24 patients as APL, 28 patients as CML, and 8 patients as CLL.
We analyzed single-nucleotide changes of these detected SNVs and found that transitions as C/G>T/A and A/ T>G/C were more prevalent than transversions including C/G>G/C, C/G>A/T, A/T>C/G, and A>T/T>A in all leukemia patients, as shown in Figure 2a. In addition, we also analyzed the regions of these SNVs. Among these variants (Figure 2b), exonic variants (54.55%) were the most frequent, followed in order by intronic variants (34.92%), splicing variants (6.14%), and UTR variants (4.39%). Additionally, we detected the genetic effects of these variants in exonic region,   Table S3 showed the top 50 significantly different genes. In ALL, the mutation frequencies of XPA, CDK4, and BCL11B were higher than other subgroups, but the frequencies of SMARCE1 and FEV were lower. The high prevalence of GNAQ, ELF4, HOXD13, and COX6C

| The filtrate pathogenic and likely pathogenic genes
Based on ACMG guideline, we identified 39 pathogenic and likely pathogenic mutations in 27 genes including SNVs or INDELs in exonic and splicing regions among 32 leukemia patients ( Figure 3 and Table 2). In AML, 21 pathogenic or likely pathogenic germline mutations in 17 genes were identified, of which 10 mutant genes only in AML patients. ALL patients had six pathogenic or likely pathogenic germline mutations, especially PBRM1 (c.2819_2829del, p.L940fs) and SUZ12 (c.1716_1717insG, p.L572fs). There were six pathogenic or likely pathogenic germline mutations among APL patients, of which PAX8 (c.G201T, p.E67D) only in APL patients. Two pathogenic mutations in ALDH2 (rs540073928, p.A175D) and FBXW7 (rs866987936, p.R361Q) were found in CLL patients. Among CML patients, seven pathogenic or likely pathogenic germline mutations were also detected, of which five mutant genes just in CML patients. These results hinted us that these mutations might play an important role in the pathogenesis of the different leukemia subgroup. The pathologic mutations for leukemia patients were shown in Table S4.

| GO annotation for pathogenic and likely pathogenic genes
Gene ontology annotation and pathway analyses were conducted for 27 pathogenic and likely pathogenic genes. The possible BP of these overlapping genes were related to the gland development, leukocyte differentiation, respiratory system development, myeloid leukocyte differentiation, mesenchymal to epithelial transition, lung development, skeletal system development, positive regulation of mesonephros development, respiratory tube development, vasculogenesis, and so on (Figure 4a). The results of GO-CC annotation suggested that these genes were involved in CC including RNA polymerase II transcription factor complex, nuclear transcription factor complex, tertiary granule, PcG protein complex, tertiary granule lumen, caveola, transcription factor complex, membrane raft, membrane microdomain, and plasma membrane raft (Figure 4b). Moreover, the MF of these pathogenic and likely pathogenic genes were mainly correlated with the DNA-binding transcription activator activity, RNA polymerase II-specific, RNA polymerase II proximal promoter sequence-specific DNA binding, proximal promoter sequence-specific DNA binding, protein self-association, cadherin binding, nuclear hormone receptor binding, histone-lysine N-methyltransferase activity, protein phosphorylated amino acid binding, hormone receptor binding, promoter-specific chromatin binding (Figure 4c).

| Pathway analysis and prediction of PPI
Further enrichment analysis based on the KEGG database showed that these pathogenic and likely pathogenic genes were highly enriched in leukemia and cancer-related pathways, as shown in Table 3. The pathways included (a) AML and CML; (b) cancer-related pathways, such as transcriptional misregulation, central carbon metabolism, and proteoglycans; (c) various cancers including thyroid cancer, bladder cancer, and endometrial cancer.
STRING database was used to construct the PPI network for 27 pathogenic and likely pathogenic genes. String map displayed these genes containing 27 nodes and 88 edges, in F I G U R E 3 Overview of the distribution of the 27 pathogenic and likely pathogenic gene according to ACMG guidelines in five subgroup of leukemia. ACMG, American College of Medical Genetics and Genomics which nodes representing proteins and edges depicting associated interactions ( Figure 5). PPI analysis showed that CEBPA, FLT3, PAX5, PAX8, RUNX1, TP53, and WT1 genes located in network hub appeared in Transcriptional misregulation in cancer.

| DISCUSSION
In this study, we performed a targeted capture deep sequencing of 504 tumor-related genes in 109 leukemia samples. We identified a total of 4,655 SNVs and 614 INDELs in 419 genes, in which PDE4DIP, NOTCH2, FANCA, BCR, ROS1, NACA, KDM5A, CLTCL1, AKAP9, MYH11, PCM1, NOTCH1, COL1A1 had more than 40 mutations. We compared the frequency of mutations and INDELs in five subgroup of leukemia (ALL, APL, AML, CLL, and CML). Moreover, ACMG pathogenic analysis identified 27 pathogenic and likely pathogenic genes. GO enrichment analysis, pathway analysis, and PPI network were performed on the pathogenic and likely pathogenic genes. Our results might provide some molecular data on mutations in leukemia to map the genetic variations of Chinese patients with leukemia. Mutational analysis was used to map the genetic variants in leukemia, and found that the highest mutations occurred in PDE4DIP, followed by NOTCH2, FANCA, BCR, and ROS1 in almost all leukemia patients. Phosphodiesterase 4D interacting protein (PDE4DIP) anchored PDE4D at the centrosome-Golgi cell region, which was involved in signal transduction and hydrolyze cGMP and cAMP to energize several reactions in the cell, including related to immune cell activation, hormone secretion, smooth vascular muscle action, and platelet aggregation (Shapshak, 2012). The protein is found to interact with a phosphodiesterase superfamily protein member (Vinayagam et al., 2011). The PDE4DIP mutations have also been previously identified in various cancers including lung cancer, medullary thyroid cancer, and ovarian cancer (Chang et al., 2018;Er et al., 2016;Y. Li et al., 2018), but not been reported in leukemia previously. Our study first reported that PDE4DIP mutations (128) in leukemia patients, suggesting PDE4DIP would be involved in pathogenesis of leukemia. Moreover, we found that PDE4DIP was downregulated in AML base on TCGA database, and the high expression was associated with poor AML prognosis. These suggested that PDE4DIP might be a tumor suppressor gene in leukemia. Results of STRING database showed that PDE4DIP protein might interact with PDE4D, PRKAR2A, and AKAP9, which can bind to cAMP or PKA, suggesting that PDE4DIP may participate in cAMP/PKA signaling pathway. In addition, PDE4DIP, or myomegalin, is a dual-specificity AKAP known to colocalize with AKAP9 and PKA at the centrosome, which could play important roles in the localization T A B L E 2 (Continued) and function of the AKAP/PKA complex in microtubule dynamics (Schmoker et al., 2018). The cAMP-dependent PKA signaling pathway was reported to involve in many fundamental cellular processes in leukemia, including migration and proliferation (Murray & Insel, 2013;Xu et al., 2016). These studies hinted that PDE4DIP might play a role in leukemia by participating in the cAMP/PKA signaling pathway, but more convincing studies were needed to validate. Previously, NOTCH2, FANCA, BCR, and ROS1 have been described to play an important role in leukemia. For example, Notch2 controlled nonautonomous Wntsignaling in CLL CLL (Mangolini, Gotte, & Moore, 2018). FANCA dysfunction might promote cytogenetic instability in adult acute myelogenous leukemia (Lensch et al., 2003). BCR-ABL1 fusion genes were leukemogenic, causing CML or ALL (Baccarani et al., 2019). ROS1 revealed a central oncogenic role in CMML, which might represent a molecular target (Cilloni et al., 2013). In our study, we found We used a methodology based on ACMG variant classification guidelines in 419 mutation genes, and identified 39 pathogenic and likely pathogenic mutations in 27 genes, which might be the cause of leukemia pathogenesis in these individuals. Particularly, there were 21 pathogenic or likely pathogenic gene in AML patients and 7 genes in CML patients, indicates that the pathogenesis of AML and CML might be more complicated. PBRM1 (c.2819_2829del, p.L940fs) and SUZ12 (c.1716_1717insG, p.L572fs) were only identified in ALL patients, ALDH2 (rs540073928, p.A175D) and FBXW7 (rs866987936, p.R361Q) only in CLL patients, and CANT1 (c.407delT, p.L136fs) and PAX8 (c.G201T, p.E67D) only in APL patients. These results suggested that the pathogenesis of different leukemia subgroups might be different. In addition, there were no studies of CANT1, KIAA1549, and NFIB on leukemia in previous literatures; therefore, the association between these genes and leukemia should be further investigated.
Subsequently, GO and KEGG analysis were performed to further understand the role of pathogenic mutation genes. GO annotation showed that the BP including gland development, leukocyte differentiation, respiratory system development, myeloid leukocyte differentiation, mesenchymal to epithelial transition, and so on were involved, which might provide further insight into the occurrence and development of leukemia. Moreover, the enriched KEGG pathway was found to be involved in leukemia (hsa05221 and hsa05220) and cancer-related pathways (hsa05200, hsa05202, and hsa05230), which was consistent with findings from other previous studies on the pathogenesis of leukemia (McClure et al., 2018;de Noronha, Mitne-Neto, & Chauffaille, 2017). Seven key genes (CEBPA, FLT3, PAX5, PAX8, RUNX1, TP53, and WT1) were obtained from PPIs network, most of which were reported to play a critical role in carcinogenesis and tumor progression (Junk et al., 2019;Rhodes, Vallikkannu, & Jayalakshmi, 2017;Slattery, Herrick, & Mullany, 2017). Inevitably, this study has several drawbacks. First, this is a single-center study, and a multi-institutional large study will be necessary to verify the results. Second, due to insufficient data of leukemia patients, we could not evaluate the correlation of the mutations and clinical and prognostic data of leukemia. Third, the potential function and pathways were only predicted by bioinformatics and needed experimental verification.

| CONCLUSION
Taken together, our study provided a map of gene mutations in Chinese patients with leukemia and enriched an understanding of the pathogenesis of leukemia. Of note, we are the first to demonstrate an association between PDE4DIP mutation and leukemia. Furthermore, we screened some pathogenic genes based on ACMG guidelines and performed GO analysis, pathway analysis, and PPI network. However, further investigations with larger cohorts and experimental research are warranted to further explore the potential mechanisms.