Systematic identification and analysis of dysregulated miRNA and transcription factor feed‐forward loops in hypertrophic cardiomyopathy

Abstract Hypertrophic cardiomyopathy (HCM) is the most common genetic cardiovascular disease. Although some genes and miRNAs related with HCM have been studied, the molecular regulatory mechanisms between miRNAs and transcription factors (TFs) in HCM have not been systematically elucidated. In this study, we proposed a novel method for identifying dysregulated miRNA‐TF feed‐forward loops (FFLs) by integrating sample matched miRNA and gene expression profiles and experimentally verified interactions of TF‐target gene and miRNA‐target gene. We identified 316 dysregulated miRNA‐TF FFLs in HCM, which were confirmed to be closely related with HCM from various perspectives. Subpathway enrichment analysis demonstrated that the method was outperformed by the existing method. Furthermore, we systematically analysed the global architecture and feature of gene regulation by miRNAs and TFs in HCM, and the FFL composed of hsa‐miR‐17‐5p, FASN and STAT3 was inferred to play critical roles in HCM. Additionally, we identified two panels of biomarkers defined by three TFs (CEBPB, HIF1A, and STAT3) and four miRNAs (hsa‐miR‐155‐5p, hsa‐miR‐17‐5p, hsa‐miR‐20a‐5p, and hsa‐miR‐181a‐5p) in a discovery cohort of 126 samples, which could differentiate HCM patients from healthy controls with better performance. Our work provides HCM‐related dysregulated miRNA‐TF FFLs for further experimental study, and provides candidate biomarkers for HCM diagnosis and treatment.

in patients of all ages, especially sudden and unexpected cardiac death in young people. 1 From encoding protein RNAs to non-coding RNAs, our understanding about HCM has improved dramatically both clinically and pathophysiologically. However, the potential molecular mechanisms underlying the pathology of HCM have not been fully understood.
Amongst many genetic factors, miRNAs and transcription factors (TFs) are two types of key gene regulators, and they both participate in many important cellular processes, including cell differentiation, proliferation, and apoptosis. 2 MiRNAs mainly regulate gene expression at the post-transcriptional level, while TFs modulate gene transcription at the transcriptional level. Researchers have demonstrated that miRNAs and TFs may synergistically regulate the same target genes, and they may mutually regulate one another; hence forming feed-forward loops (FFLs), which have been reported to form recurrent network motifs and play important roles in the mammalian gene regulatory network. 2,3 Thus, dysregulated miRNA-TF FFLs will lead to a series of diseases, and deciphering the interplay between miR-NAs and TFs by means of FFLs will yield new mechanistic insights into specific biological events.
Currently available biological databases have integrated different types of molecular interactions, which made it possible to identify miRNA-TF FFLs. For instance, TarBase, 4 miRTarBase, 5 and miRecords 6 collected experimentally verified miRNA-gene regulatory relationships for different organisms, while TargetScan 7 predicted biological targets for miRNAs. TRED 8 and Transfac 9 offered experimentally confirmed TF-gene regulatory relationships, while JASPAR 10 predicted TF targets. TransmiR database 11 recruited experimentally verified TF-miRNA regulatory pairs. Additionally, large amount of genome-wide data such as microarray data and next-generation sequencing data also provide us more valuable information to investigate dysregulated FFLs implicated with specific cellular processes and diseases.
Much efforts have been devoted to detect miRNA-TF FFLs, which were used to dissect potential regulatory mechanisms underlying human diseases. 12,13 On the one hand, started from diseaserelated molecules and different regulatory relationships amongst miRNAs, genes and TFs, Ye et al 14  FFLs. Therefore, using FFLs to decipher the pathological and physiological mechanisms underlying diseases will provide new clues for understanding disease initiation and progression.
In this study, we presented a method for systematically identifying dysregulated miRNA-TF FFLs in HCM, and 316 dysregulated FFLs were obtained. We found that these dysregulated FFLs were significantly enriched in significantly differentially expressed molecules and the known HCM-related molecules. Functional analysis also demonstrated that the FFLs were closely associated with HCM.
We further investigated the global architecture and feature of regulation between miRNAs and TFs in HCM by constructing a dysregulated miRNA-TF regulatory network, from which a FFL (hsa-miR-

| Candidate miRNA-TF FFLs
In this study, we focused on three-node miRNA-TF FFLs, which included a gene, a miRNA, and a TF. According to the main regulator, miRNA-TF FFLs can be typically classified into three types 12,13 : miRNA-FFL, TF-FFL, and composite FFL ( Figure S1). In a miRNA-FFL, | 307 miRNA is the main regulator, which controls the expression of a TF and their common target gene, while in a TF-FFL, TF is the primary regulator. In a composite-FFL, a miRNA regulates the expression of a TF and a target gene. Simultaneously, the TF dominates the expression of the miRNA and the target gene. By integrating expression profile data and the interactions between miRNAs, TFs, and genes, we identified 6,809 candidate miRNA-TF FFLs ( Figure 1A), which comprised of 5549 miRNA-FFLs (81.49%), 851 TF-FFLs (12.50%) and 409 composite FFLs (6.01%). These candidate FFLs included 434 miRNAs, 1253 genes, and 223 TFs.

| Identification of dysregulated miRNA-TF FFLs in HCM
The dysregulated miRNA-TF FFLs in HCM were identified using sample matched expression profile data. Firstly, each node was scored according to the extent of differential expression 18,20 using the formulas (1) and (2): Diff node ¼ ðÀ log 10 pÞ Á j log 2 FCj (2) where φ −1 is the inverse normal cumulative distribution function. p is the P-value which indicates the significance of differential expression computed by the R "limma" package. FC is the fold change of this gene expression.
Secondly, each edge was scored according to the change of gene co-expression between HCM case samples and control samples using the following equations 18,21,22 : X ¼ Fðr case Þ Á ðÀ log 10 p case Þ À Fðr control Þ Á ðÀ log 10 p control Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1:06 where r case and r control are the Spearman correlation coefficient of gene expression in case and control samples, respectively. While n case and n control represent the samples size, respectively. Function of F is Fisher transformation, which have been shown that applying it could improve the power of identifying differentially rewired genes. 23 Subsequently, the score of a candidate miRNA-TF FFL was calculated by combining the node score and the edge score as follows: where n node and n edge denote the number of nodes and edges in the miRNA-TF FFL. The parameter α∈(0,1) is used to control the weight of node score and edge score. Here, we considered the node and edge score were weighted equally, and selected α as 0.5.
Finally, a P-value was computed to reflect the significance of a miRNA-TF FFL. We constructed a random miRNA-TF FFL by randomly selecting a miRNA, a gene, and a TF, and then calculated the score of this FFL through the above procedure. This process repeated 10 000 times. The empirical P-value was defined as the proportion of randomly obtained FFL scores larger than the real FFL score as below: In this article, the FFL with P-value <0.01 were selected as dysregulated FFLs.

| Collection of genes and miRNAs related to HCM
Genes related to HCM were compiled from a comprehensive human gene-disease association database, DisGeNET (V5.0), 24 which integrated many currently widely used gene-disease databases, such as the Online Mendelian Inheritance in Man database, 25 the Comparative Toxicogenomics Database, 26 the Genetic Association Database, 27 the Mouse Genome Database, 28 PubMed and Uniprot. 29 After removing repeating gene-disease entries, 300 unique genes associated with HCM were obtained.
HCM-related miRNAs were collected by performing a comprehensive literature review. We searched three manually curated and experimentally confirmed human miRNA-disease association databases: HMDD (version 2.0), 30 miR2Disease, 31 and PhenomiR (February 2011), 32 and found that none of these databases contained miRNAs related to HCM, although HMDD and miR2Disease included miRNAs associated with cardiac hypertrophy. Therefore, we inquired PubMed using the phrase "hypertrophic cardiomyopathy AND microRNA." Each article was manually searched for miRNAs with  33 and 54 unique mature miRNAs were ultimately selected.

| Subpathway enrichment analysis
Existing researches have shown that type-specific biological functions tend to be located in local areas of the pathway (subpathway) instead of the entire pathway, and thus subpathway may provide more detailed explanations for pathogenesis. 34,35 KEGG subpathway (local area of the entire biological pathway) enrichment analysis was performed by the R "SubpathwayMiner" package. 34 Significantly enriched subpathways were identified with a P < 0.05. To show the results more clearly, we retained the subpathway with the minimum P-value if multiple significantly enriched subpathways corresponded to an entire pathway.

| Identification of potential diagnostic biomarkers in HCM
Potential biomarkers for distinguishing HCM patients from controls were identified using a classification model based on support vector machine (SVM), which was performed using the R "e1071" package.
The performance was evaluated by classification accuracy and the area under the receiving operating curve (AUC) using 5-fold crossvalidation.
The optimal biomarkers in HCM diagnosis were selected using Li et al's method. 36 We computed classification accuracy for all combinations by applying SVM, and the optimal biomarkers were selected considering a balance between classification accuracy and the number of biomarkers.

| Dysregulated miRNA-TF FFLs in HCM
We identified 316 dysregulated miRNA-TF FFLs in HCM using our method ( Figure 1B and  18 We investigated significantly differentially expressed (SDE) miR-NAs, genes and TFs in dysregulated FFLs. The R "limma" package with P < 0.01 and fold change >1.2 were used to select SDE molecules. As shown in Figure S2, The proportion of SDE molecules was significantly higher than that of candidate FFLs (hypergeometric test,  (Table S5), suggesting that the initiation and progression of diseases was caused by not only the expression change of one single gene, but more importantly the inter-gene interaction.
We paid close attention to the top 5% dysregulated FFLs (Figure 3A). These FFLs included 32 molecules, of which 20 were SDE, seven were the known HCM-related molecules and the remaining 25 molecules were all verified to be implicated with cardiac development or cardiovascular disorders ( Table 2 and Table S6). The subpathways enrichment analysis showed that the three cardiovascular disease pathways were all ranked at the top positions ( Figure 3B). GO analysis revealed that the genes in top 5% dysregulated FFLs tended to be significantly enriched in three GO terms (Biological Process, P < 0.05) functional clusters using Cytoscape plug-in ClueGO, and these clusters mainly included apoptotic. The known researches showed that apoptosis was involved in the development of myocardial fibrosis in familiar HCM 37 and apoptosis constituted a major biological phenomenon in the development of HCM. 38 Interestingly, the FFL formed by hsa-miR-17-5p, MCL1, and STAT3 has been confirmed in a recent report. 39 Kumar et al showed that miR-17-5p regulated autophagy in murine mycobacterium tuberculosisinfected macrophages by targeting MCL1 and STAT3, and STAT3 silencing suppressed MCL1 levels. 39  We further investigated degree and betweenness centrality (BC) of the known HCM-related nodes and nodes in top 5% dysregulated FFLs. As shown in Figure 5A-D, HCM-related nodes and top 5% dysregulated FFLs nodes have significantly higher degree and BC than other nodes (Wilcoxon rank sum test). The nodes with highly connected features (hubs) and high BC are usually considered to play critical roles in maintaining the overall connectivity of the network.
Amongst these hub nodes and high BC nodes, we found that hsa-miR-17-5p, FASN, and STAT3 formed a FFL, and this FFL was within the top 5% dysregulated FFLs (Figure 3). Hsa-miR-17-5p and FASN were both known HCM-related genes. While increased levels of phosphorylated STAT3 were observed in a double-mutation mouse model of familial HCM, and corresponded with the occurrence of disease. 42 Additionally, it has been reported that by targeting STAT3, miR-17-5p regulated mouse cardiomyocyte apoptosis in response to ischaemia followed by reperfusion 43 and induced protective autophagy and anti-apoptosis in vascular smooth muscle cells. 44   was performed using expression data of the two panel biomarkers and two major sample clusters with clear differences were found ( Figure 6D-F). For three TFs, the rates of HCM patients in the predicted HCM group were 100% both in training set (106/106) and test set (7/7), whereas the corresponding rates in the predicted healthy group were 45% (9/20) and 80% (4/5), respectively. For four miRNAs, the rates of HCM patients in the predicted HCM group were 92.45% (98/106) in training set, whereas the corresponding rates in the predicted healthy group were 10% (2/20). All these results indicated that the signatures we identified had better performance in distinguishing HCM patients from controls.

| DISCUSSION
TFs and miRNAs are two key regulators that mainly regulate gene expression at the transcriptional and post-transcriptional level.
Accumulating evidence has demonstrated the important roles of their combinational regulation acting as FFLs in various cellular processes and diseases. In this study, we introduced a novel analysis approach for identifying dysregulated miRNA-TF FFLs between two biological states such as health and disease. We then identified dysregulated miRNA-TF FFLs associated with HCM, which were confirmed to be closely related with HCM from different perspectives. We investigated the global architecture and feature of dysregulated FFLs in HCM, from which a FFL (hsa-miR-17-5p, STAT3 and FASN) might play important roles in HCM and two panels of diagnostic biomarkers defined by three TFs and four miRNAs were identified.
The main idea of the computational method we proposed is differential co-expression in normal and disease states. For a miRNA-TF FFL, we evaluated the strength of its dysregulation by integrating differential expression of the nodes and differential co-expression of the edges in the FFL. Therefore, differential co-expression analysis can not only reflect dynamic changes of a single gene, but also capture the connections between genes. This approach was originated from Jiang's method, 18 and we improved it. Compared with Jiang's method, we observed that the more significant the FFLs we identified, the more HCM-related biological information they contained.
The investigation of biological pathway also revealed that the dys-  hsa-miR-181a-5p 0.0496 miRNAs and TFs with " Δ " denote the miRNAs, genes, and TFs are within top 5% dysregulated FFLs, and those with " # " indicate the known miR-NAs, genes, and TFs associated with HCM.