Distinct noncoding RNAs and RNA binding proteins associated with high‐risk pediatric and adult acute myeloid leukemias detected by regulatory network analysis

Abstract Background Acute myeloid leukemia (AML) is a heterogeneous disease in both children and adults. Although it is well‐known that adult and pediatric AMLs are genetically distinct diseases, the driver genes for high‐risk pediatric and adult AMLs are still not fully understood. Particularly, the interactions between RNA binding proteins (RBPs) and noncoding RNAs (ncRNAs) for high‐risk AMLs have not been explored. Aim To identify RBPs and noncoding RNAs (ncRNAs) that are the master regulators of high‐risk AML. Methods In this manuscript, we identify over 400 upregulated genes in high‐risk adult and pediatric AMLs respectively with the expression profiles of TCGA and TARGET cohorts. There are less than 5% genes commonly upregulated in both cohorts, highlighting the genetic differences in adult and childhood AMLs. A novel distance correlation test is proposed for gene regulatory network construction. We build RBP‐based regulatory networks with upregulated genes in high‐risk adult and pediatric AMLs, separately. Results We discover that three RBPs, three snoRNAs, and two circRNAs function together and regulate over 100 upregulated RNA targets in adult AML, whereas two RBPs are associated with 17 long noncoding RNAs (lncRNAs), and all together regulate over 90 upregulated RNA targets in pediatric AML. Of which, two RBPs, MLLT3 and RBPMS, and their circRNA targets, PTK2 and NRIP1, are associated with the overall survival (OS) in adult AML (p ≤ 0.01), whereas two different RBPs, MSI2 and DNMT3B, and 13 (out of 17) associated lncRNAs are prognostically significant in pediatric AML. Conclusions Both RBPs and ncRNAs are known to be the major regulators of transcriptional processes. The RBP–ncRNA pairs identified from the regulatory networks will allow better understanding of molecular mechanisms underlying high‐risk adult and pediatric AMLs, and assist in the development of novel RBPs and ncRNAs based therapeutic strategies.


| INTRODUCTION
Acute myeloid leukemia (AML) is a heterogeneous disease in both children and adults. It is the most common acute leukemia caused by malignant transformation of hematopoietic progenitor cells through a wide range of molecular alterations. 1 The risk of developing AML is also age-associated with the number of incidences rising with age, and childhood AML is different from adult AML both biologically and clinically. 2,3 In general, pediatric AML has a lower number of somatic mutations and higher frequency of cytogenetic abnormalities compared to its adult counterpart. [4][5][6] Besides their significant differences in genetic landscape, there are also vast differences in their epigenetic landscape, leading to distinct expression patterns in childhood and adult AML. 7 In addition, the prognoses of childhood and adult AML are quite different.
The overall survival (OS) in adults remains low with the 5-year survival rate of 20%. While the OS in children is higher with the 5-year survival rates of 70%, However, intensive chemotherapy regimens in children are burdensome, and half of childhood leukemia-related deaths are caused by relapsed/refractory AML. 7,8 Particularly, different pediatric AML subtypes (low, intermediate, and high-risk) have very different prognoses with the 5-year survival ranging from 22 to 90%, and the 5-year OS for high-risk pediatric AML is below 30%. 9,10 Few new drugs for pediatric AML have been discovered over the decade, despite biological and technical advances. There is an urgent need for better therapies for both pediatric and adult AML.
RNA binding proteins (RBPs) are proteins that bind RNA with or without RNA-binding domains (RBDs) and play a pivotal role in posttranscriptional regulation of gene expression. [11][12][13] RBPs regulate various aspects of RNA function, including transcription, splicing, modification, intracellular trafficking, translation and decay. 13,14 A RBP can bind and control a large number of RNA targets, and is involved in dynamic interactions of RBPs and their regulated RNAs . 15 The deregulations and malfunctions of RBPs may lead to many diseases including cancers. 16 In AML, aberrant RBP expression has been commonly linked to promoting cancer progression through co-and post-transcriptional mechanisms. 17 However, although it is promising to target RBPs therapeutically for AML, 18 To dissect the roles of ncRNAs and RBPs in adult and pediatric AML, and explore the potentials of targeting RBPs or ncRNAs therapeutically, we must study the complex regulatory networks formed by the interactions among RBPs, ncRNAs, and targeted coding RNAs, and fully understand the molecular mechanisms of the RBP and ncRNA functions in AML. Therefore, in this pilot project, we are concentrating on building gene regulatory networks with adult and childhood AML transcriptomic data, and identify RBPs, ncRNA, and their interactions that are important for high-risk adult and pediatric AMLs, respectively.

| Distance correlation test for RBP-based gene regulatory network analysis
RBPs bind and control a wide array of RNA targets that are critical for cancer progression. Most recently, there has been evidence that RBPs act as important regulators of lncRNAs in cancer. 34 Mechanically, RBPs are commonly deregulated in cancer and might thus play a major role in the deregulation of cancer-related ncRNAs. However, it is not clear how RBPs interact with ncRNAs in pediatric and adult AML. We, therefore, aim to construct RBP-based gene regulatory networks with distance correlation tests, and explore RBP and ncRNA interactions in high-risk AMLs.
Distance correlation was proposed recently to measure both linear and nonlinear dependence between two sets of variables. 35,36 It is straightforward to compute and asymptotically equals zero if and only if independence. Given the expressions of the paired of genes (x,y) = {(x i, y i ), i = 1,2,...,n}, where x denotes a RBP, and y represents its coding RNA or ncRNA targets, and n is the number of patient samples, the distance matrices A and B for x and y, respectively, are defined as The sample distance covariance and variances are then estimated as So the sample distance correlation can be calculated as The associated RBP and its RNA target pairs are then identified with the Chi-square statistical test, 36 and the p value is calcu- We reject the null hypothesis that the RBP and its RNA target are independent if and only if is less than the significance level.

| RESULTS
After

| High-risk pediatric and adult AMLs are genetically distinct diseases
To prevent the confounding effect between the adult and childhood AML, we analyze their expression profiles separately. We perform a two-sided Student's T-test, and identify upregulated genes in high-risk adult and pediatric AML, respectively, with the criteria of and two-fold changes. A total of 414 upregulated genes in high-risk adult AML are selected from the TCGA cohort, whereas 440 upregulated genes in high-risk pediatric AML are discovered with the TARGET cohort. However there are only 17 (less than 5%) overlapped genes as demonstrated in Figure 1, and details of the identified genes are reported in Table S1.
Among the upregulated genes in high-risk adult and pediatric AMLs, several of them have been investigated in previous publications.
For instance, RGS10 and FAM26F were found to be prognostically significant in pediatric AML. 3  although the high-risk subgroup was only compared to the low-risk subtype in their study, Finally, previous study discovered promoter hypermethylation of genes CDH1 and WT1 in AML, 38 we further demonstrate that CDH1 and WT1 are upregulated in adult AML.
As shown on the Venn diagram of the upregulated genes in Figure 1A, less than 5% of the upregulated genes from the TCGA and less studied. We discover that both genes are overexpressed in highrisk AML, and the upregulations of two genes are associated with patient survival in both adult and pediatric AMLs. 44 Next, we will construct RBP-based gene regulatory networks with the upregulated genes in high-risk adult and pediatric AML, respectively.  17,20 To the best of our knowledge, the functional mechanisms and causal directions between RBPs and ncRNAs in AMLs have not been explored before. The RBP-ncRNA pairs we discovered may provide potential therapeutic targets for AML.

| The prognostic significance of RBPMS-NRIP1 and MLLT3-PTK2 pairs in adult AML
Both RBPs and ncRNAs play pivotal roles in the process of gene expression, RNA maturation, and protein synthesis, and control gene expression and disease progression. 12,30 The identified RBP-ncRNA interaction pairs may indicate that they control all the posttranscriptional events in the cell together. Therefore, we estimate the prognostic significance of three RBPs, three snoRNAs, and two cir-cRNAs in adult AML with log-rank test, and identify two RBPs, MLLT3 and RBPMS, and their circRNA targets are associated with overall survival (OS). The Kaplan Meier curves are in Figure 5.  Table 1. We not only discover that high expression MSI2 and DNMT3B are associated with inferior overall survival, but also find that the majority of lncRNAs correlated with MSI2 and DNMT3B are also prognostically important in pediatric AML. The lncRNAs may act as cofactors of MSI2 and DNMT3B, and have an important regulatory role in various molecular processes.

| DISCUSSIONS
Pediatric and adult AML is known to be genetically different diseases. 2,3 We confirm the finding through transcriptomic analysis of the RNAseq data in TCGA and TARGET cohorts. By comparing the upregulated genes in high-risk pediatric and adult AML, we discover that there are less than 5% commonly upregulated genes in both cohorts. We further demonstrate the molecular and functional differences with gene enrichment analysis. The enriched pathways and molecular functions of the upregulated genes are also distinct in childhood and adult AMLs.

| CONCLUSIONS
Although it is well known that pediatric and adult AMLs are genetically distinct diseases, the driver genes for high-risk pediatric and