Identification of lncRNA‐associated differential subnetworks in oesophageal squamous cell carcinoma by differential co‐expression analysis

Abstract Differential expression analysis has led to the identification of important biomarkers in oesophageal squamous cell carcinoma (ESCC). Despite enormous contributions, it has not harnessed the full potential of gene expression data, such as interactions among genes. Differential co‐expression analysis has emerged as an effective tool that complements differential expression analysis to provide better insight of dysregulated mechanisms and indicate key driver genes. Here, we analysed the differential co‐expression of lncRNAs and protein‐coding genes (PCGs) between normal oesophageal tissue and ESCC tissues, and constructed a lncRNA‐PCG differential co‐expression network (DCN). DCN was characterized as a scale‐free, small‐world network with modular organization. Focusing on lncRNAs, a total of 107 differential lncRNA‐PCG subnetworks were identified from the DCN by integrating both differential expression and differential co‐expression. These differential subnetworks provide a valuable source for revealing lncRNA functions and the associated dysfunctional regulatory networks in ESCC. Their consistent discrimination suggests that they may have important roles in ESCC and could serve as robust subnetwork biomarkers. In addition, two tumour suppressor genes (AL121899.1 and ELMO2), identified in the core modules, were validated by functional experiments. The proposed method can be easily used to investigate differential subnetworks of other molecules in other cancers.


| INTRODUC TI ON
Oesophageal carcinoma is the eighth most common and the sixth most lethal cancer with a poor 5-year overall survival ranging from 15% to 25%. 1 Oesophageal squamous cell carcinoma (ESCC) is the predominant histological type of oesophageal carcinoma worldwide. 2 At present, the regulatory mechanisms underlying ESCC remain largely unknown. Based on the accumulating transcriptome data, traditional differential expression analysis has successfully identified a handful of oncogenes and tumour suppressors, including protein-coding genes (PCGs) and miRNAs, such as PTK6, 3 Rab25, 4 miR-25 5 and miR-29c, 6 and recently encompassed long non-coding RNAs (lncRNAs), such as HOTAIR, 7,8 AFAP1-AS1 9 and lncRNA625, 10 due to improvement of sequencing techniques. These dysregulated genes serve as potential diagnostic or prognostic biomarkers and valuable targets for further study to understand pathologic mechanisms in ESCC. [3][4][5][6][7][8][9][10] Despite the enormous contributions to the field, differential expression analysis has not captured the full potential of transcriptome data. Some genetic mutations and post-translational modifications, such as methylation, phosphorylation and acylation, can modify protein activity without affecting the gene expression level, but can alter the interaction pattern with other genes. 11,12 A well-known example is APC, the most common mutated gene in colorectal cancer, whose frequent mutation leads to a truncated protein that lacks the binding sites for certain interacting proteins. 13 Thus, an analysis based solely on differential expression analysis may miss some key driver genes.
On the other hand, differential expression analysis treats genes individually, but does not account for the interactions among them, and it is widely accepted that understanding the mechanisms underlying disease must consider the contributions of alterations in gene interaction. 11 Recently, differential co-expression analysis has emerged as an effective tool that complements differential expression analysis to provide better insights of dysregulated mechanisms and indicate key driver genes. 11,[14][15][16][17][18][19] Differential co-expression measures the correlation difference of a gene pair between two conditions (eg healthy and diseased samples). As co-expressed gene pairs are more likely to have putative interactions, dependencies or coordinated activities in a given biological state, changes in co-expression patterns between two conditions may reveal disease-associated dysregulated mechanisms and indicate key driver genes. 19 For human cancer, the gene co-expression relationships in normal samples are extensively lost in matched tumour samples, such as breast cancer, colorectal cancer, lung cancer and gastric cancer. 12,14,20,21 Many studies have focused on this discrepancy to identify genes or gene modules that are dysfunctional in tumour samples by differential co-expression analysis. For example, Anglani et al showed that differential co-expression analysis was complementary to differential expression to unveil novel candidate cancer genes and improve the classic pathway enrichment analysis. 12 In clear cell renal cell carcinoma, an HNF4Aassociated module was found to be functional in normal tissues but disrupted in tumour tissues, which could promote cell proliferation. 20 Furthermore, differential co-expression analysis has been successfully used to identify differentially co-expressed modules, a group of genes significantly correlated under one condition but not the other, which may reflect dynamic changes in gene interaction networks. 11,[17][18][19][20][21] However, for ESCC, the differential co-expression patterns of genes have not been investigated. This promoted us to apply differential co-expression analysis on ESCC and identify differentially co-expressed modules, which may help to reveal the dysfunctional regulatory networks underlying ESCC development and suggest novel driver genes.
On the other hand, lncRNAs are attracting more and more attention with their widespread roles in cancer, including ESCC. [7][8][9][10]22,23 However, the function of the vast majority of lncRNAs remains enigmatic. Meaningful understanding of lncRNA function can only be achieved from detailed study on a case-by-case basis, which lacks candidate targets. To advance the understanding of lncRNA-associated dysregulated mechanisms in ESCC and provide potential targets, large scale identification of differentially co-expressed ln-cRNA-PCG modules is urgently required.
Here, we construct a differential co-expression network (DCN) based on ESCC expression data and propose a novel algorithm to identify lncRNA-associated differential subnetworks on a large scale by integrating both differential expression and differential co-expression. The identified differential lncRNA-PCG subnetworks provide a valuable source for revealing lncRNA functions and the associated dysfunctional regulatory networks in ESCC. The functions of two tumour suppressor genes (AL121899.1 and ELMO2), identified in the core modules, were further validated using functional experiments.

| Data sets
Three independent ESCC data sets were collected. The first two data sets (GSE53624 and GSE53622) were obtained from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). Expression profiles of 238 samples (119 paired ESCC and adjacent normal tissues) in GSE53624 were profiled by using the Agilent-038314 human lncRNA + mRNA microarray V2.0 platform.
We reannotated probe sets of this platform with three steps.
1. Probe sequences of the Agilent-038314 array were aligned to PCG and lncRNA transcripts obtained from GENCODE database (GRCh38, release 21) 24 by using BLASTn. 25 Only probes perfectly mapped to lncRNAs or PCGs were retained.
2. Probes that mapped to both PCGs and lncRNAs were removed.
3. Probes targeting more than one PCG or lncRNA were removed.
The retained probes mapped uniquely to a PCG or a lncRNA transcript with no mismatch, resulting in 17 434 PCGs and 6252 ln-cRNAs. The 119 paired tissues were randomly split into a training set (60 paired tissues, ESCC-train) and a test set (59 paired tissues, ESCC-test). The expression profiles of 120 samples (60 paired ESCC and adjacent normal tissues, ESCC-valid) in GSE53622 were processed in the same way.
The third data set contains the RNA-sequencing data of 30 samples (15 paired ESCC and adjacent normal tissues) reported in our previous study. 10 We extracted RPKM expression profiles by using TopHat version 2.0.6 26 and easyRNAseq version 1.6.0. 27 Transcripts that were not detected in more than 20% of the samples were removed, resulting in 16 178 PCGs and 3498 lncRNAs. The data were quantile normalized and log2-transformed.

| Construction of normal and tumour co-expression networks
The normal co-expression network (NCN) was constructed based on expression profiles of the 60 normal tissues in the ESCC-train data set. For each pair of genes i and j, the Pearson correlation coefficient (PCC) r N ij and the associated P-value P N ij (two-sided Student's t test, Benjamini and Hochberg (BH) correction) were calculated by using the WGCNA 17 R package. Gene pairs with P N ij < t N were used to construct the NCN, where t N was the normal PCC P-value threshold.
The tumour co-expression network (TCN) was constructed, using the same method as the NCN, based on expression profiles of the 60 ESCC tissues in the ESCC-train data set. Gene pairs with P T ij < t T were used to construct the TCN, where P T ij was the BH corrected PCC P-value in tumour samples, and t T was the tumour PCC P-value threshold.

| Construction of the differential co-expression network (DCN)
To test whether the difference between the PCC of a gene pair in normal samples and that in tumour samples was significant, the PCCs were transformed into z scores by using the Fisher transformation: where r is the PCC, and z is the Fisher-transformed PCC. Then, z is ap- is the number of the samples. 20 The PCCs r N ij and r T ij of a gene pair (i, j) in the normal and tumour samples are transformed into z N ij and z T ij by using Equation (1), respectively. The difference between z N ij and z T ij can be measured by the following equation: where n T and n N are the number of tumour and normal samples, respectively. The variable Δz is approximately normally distributed with a mean of zero and variance one. Thus, we are able to apply a Z test to calculate the associated P-value P z ij under the null hypothesis that z N ij and z T ij are equal. The P-values P z ij are controlled for multiple testing by BH correction. The weight of differential co-expression between gene i and j is defined as: where Pz ij is the BH corrected value of P z ij , and t z is the Z test P-value graph with the edge weight reflecting the extent of differential co-expression. The three thresholds t z , t N and t T control the reliability of the links in DCN. In general, the smaller the thresholds are, the more reliable the links are. We used t z = t N = t T = 10 −7 in this study.

| Scoring subnetworks
Both differential expression and differential co-expression are integrated to score a subnetwork. Given a subnetwork G = (V, E), where V is the set of nodes, and E is the set of edges, the score of differential expression is defined as: where t i is the t statistic in a paired, two-tailed t test comparing the expression values of gene i between tumour samples and normal samples, |t i | is the absolute value of t i , and |V| is the cardinality of the set V.
The score of differential co-expression is defined as: where |E| is the cardinality of the set E. Then, Equations (4) and (5) are integrated to define the subnetwork score: where α∈[0,1] is a parameter to control the relative weight of differential expression DE G and differential co-expression DC G .

| Searching for lncRNA-associated subnetworks
Given the subnetwork score function (6), a greedy search was performed to identify subnetworks within the DCN for which the scores are locally maximal. 28 The search starts from a seed and iteratively adds neighbouring nodes. Each lncRNA is used as the seed in turn to initialize a subnetwork. At each iteration, the search considers addition of a node from the neighbours of nodes in the current subnetwork, and the corresponding edges connect this node and the current subnetwork. The addition which yields the maximum score is adopted. The search will stop if no node satisfies the following two conditions: (a) the number of edges in the shortest path between this node and the seed is less than or equal to d, and (b) the addition of this node increases the score of the subnetwork over an improvement rate r. The parameter d is a positive integer that controls the search space and r > 0 controls the increasing rate of the subnetwork score. For each lncRNA, the resulting subnetwork is usually composed of a few lncRNAs and a majority of PCGs, depending on its neighbouring genes.

| Evaluation of subnetworks
Two tests were performed to evaluate the statistical significance of the subnetworks. The first one randomly permuted the sample labels 100 times and recalculated subnetwork scores of all real subnetworks. This generates a null distribution of random subnetwork scores for each real subnetwork. Then, the significance level P 1 is obtained by indexing the score of each real subnetwork on the null distribution of the corresponding random subnetwork scores.
The second test constructed 100 random subnetworks for each real subnetwork. For a real subnetwork with n genes and e edges, the corresponding random subnetworks composed the seed lncRNA and other randomly selected n−1 genes. Edges among these n genes were sorted according to edge weight in decreasing order, and the top e edges were used to construct the random network. The subnetwork scores of these 100 random subnetworks constitute the null distribution of the real subnetwork score. Then, the significance level P 2 is obtained by indexing the score of each real subnetwork on the corresponding null distribution.

| Classification and clustering
The subnetwork expression profiles were inferred by the pathway activity inference method (DRWPClass) proposed by Liu et al. 29 Three other methods (mean and median and PCA) 29 were also performed for comparison. The logistic regression with lasso for feature selection was used to build the classifier, which was implemented with R package 'glmnet'. 30 Hierarchical clustering was performed with PCC as the distance measure and complete-linkage as the clustering method.

| Cell culture
Sources of oesophageal cancer cell lines have been described previously. 31,32 KYSE150, KYSE510 and TE3 cells were maintained in RPMI-1640 medium containing 10% foetal bovine serum. KYSE450 cells were maintained in DMEM (HyClone) medium containing 10% newborn bovine serum. All were incubated with 5% CO 2 and 80% humidity at 37°C.

| Reverse transcription and quantitative realtime PCR (qRT-PCR)
Total RNA was extracted using TRIzol following the manufacturer's (forward), 5′-GATAGCAACGTACATGGCTGGG-3′ (reverse). β-Actin was used as the control and for normalization. All qRT-PCR analyses for each gene were repeated at least three times.

| Western blotting assay
Western blotting was performed according to previously described methods. 33 Briefly, total cell lysates were prepared in RIPA buffer, separated by SDS-PAGE and transferred to PVDF membranes (Millipore). Membranes were incubated in blocking buffer and then incubated with the indicated antibody. Finally, immunoreactive bands were revealed using luminol reagent (Santa Cruz Biotechnology, DE).
Photography and quantitative analyses were done using ChemiDoc Touch (Bio-Rad). The following antibodies were used: mouse anti-GFP (Santa Cruz Biotechnology) and mouse anti-β-actin (Sigma).

| Statistical analysis
Statistical analyses were performed using R 3.4.2. The differentially expressed genes from the microarray data were defined as genes with a t test P-value < .05 (BH correction) and a fold change > 2 or <0.5. The differential expression analysis of RNA-Seq data was performed using DEGSeq. 36 Gene enrichment analysis was performed using Metascape (http://metas cape.org). 37

| Extensive loss of connectivity in the tumour co-expression network
The data set ESCC-train was used to construct co-expression networks. Genes with invariable expression across 60 paired ESCC and adjacent normal tissues (coefficient of variation < 0.05) were filtered out, resulting in 13 073 PCGs and 5379 lncRNAs.  Table 1), suggesting a seriously disrupted regulatory system in tumour samples.  Figure 1D). For example, lncRNA AL121899.1

| Construction of the DCN in ESCC
and PCG CST6 were positively correlated in normal samples (r Normal = .856), but negatively correlated with tumour samples (r Tumour = −.419, Figure 1E). While BICD2 and CCDC78 had no correlation in normal samples, they were positively correlated in tumour samples (r Tumour = .905, Figure 1F). To measure the significance of the difference between PCCs in normal and tumour samples, a Fisher transformation followed by a Z test was applied to the PCCs. A strict P-value threshold of 10 −7 was used to determine whether the PCCs in normal and tumour samples were significantly different. Then, a gene pair was considered as differentially co-expressed if the Pvalue in the Z test < 10 −7 and simultaneously, the two genes were connected in the NCN or TCN. Finally, a DCN was constructed based on differentially co-expressed gene pairs (Figure 2A).
In the DCN, there were 1746 PCGs and 328 lncRNAs that were linked by 3917 edges. As NCN and TCN, DCN was a scale-free network with highly variable degrees ( Figure 2B). The degree distribution of the DCN was different from that of a degree-preserving random network ( Figure 2C, Figure 2D), indicating that differential co-expression had weak correlation with differential expression.
The average clustering coefficient of the DCN was much larger than that of the random networks (c DCN = 0.0275» c random ≈ 0.0018, Figure 2E). The shortest path length was comparable to that of the random networks (L DCN = 5.6363 ≈ L random ≈ 5.8414, Figure 2F), indicating that the DCN is also a small-world network. 39 Furthermore, the clustering coefficients asymptotically followed the scaling law c(k)~k −1 ( Figure 2G), suggesting that the DCN is characterized by a potential modular organization, which is a general feature of biological networks. 40

| Identification of lncRNA-associated differential subnetworks
To identify lncRNA-associated differential subnetworks, a greedy search algorithm was applied to search differential subnetworks in the DCN using each of the 328 lncRNAs as the seed. A total of 328 lncRNA-associated subnetworks were identified. With the parameters d = 2, r = .1 and α = .7, the number of nodes in the identified subnetworks ranged from two to 14 (mean number = 7.45), with the majority (63.72%) having between six and 10 nodes ( Figure S2A). The number of edges ranged from one to 19 (mean number = 6.98) and the majority (58.23%) between five and nine ( Figure S2B). The subnetwork scores had a mean of 16.44 and did not increase with the number of genes when the number of genes exceeded 5 ( Figure S2C) and also the number of edges when the number of edges exceeded 4 ( Figure S2D), indicating that the subnetwork scores were independent of the sizes of subnetworks.
To identify differential subnetworks that are statistically signif-  networks that were significant in all three data sets were considered to be differential subnetworks (Table S2-S3). Figure S3 illustrates an example of differential subnetwork (DS_AL121899.1), which was identified by using lncRNA AL121899.1 as the seed. DS_AL121899.1 was a down-regulated subnetwork located in Region I in the DCN ( Figure 2A). All nine genes in DS_AL121899.1 were down-regulated, and 11 gene pairs were positively correlated in the normal samples but lost the correlations in the tumour samples. As expected, the differential expression ( Figure S3B) and differential co-expression patterns ( Figure S3C) were consistent across the three data sets, indicating the robustness of the differential subnetworks.

| Differential subnetworks are discriminative between ESCC and normal tissues
Given that the differential subnetworks showed consistently distinct expression patterns between ESCC and normal samples, we next in- DS_AC009948.1, Figure 3C) were also marked in Figure 3E.
Clustering analysis showed that 107 subnetwork expression profiles could distinguish ESCC from normal samples across all three data sets ( Figure 3E, Figure S4 and S5). Feature selection with lasso identified 11 discriminative subnetworks (Table S4), which almost perfectly classified ESCC and normal samples in the three data sets (AUC = 1, 0.998 and 1, respectively, Figure 3F). Three other pathway activity inference methods (mean, median and PCA) identified different subnetwork markers (Table S4). According to the principle of the pathway activity inference methods, the differential subnetworks identified by the mean and median method tend to contain genes that are simultaneously up-regulated or down-regulated (eg DS_MIR4435-2HG, Figure 3B), while differential subnetworks identified by the PCA method tend to have the largest overall variations in gene expression. The differential subnetworks identified by DRWPClass have the strongest discriminative ability (Table S4), including not only the subnetworks with consistent gene expression changes, but also the subnetworks containing both up-regulated and down-regulated genes (eg DS_AC009948.1, Figure 3C), which are common in dysfunctional regulatory networks. However, all the four methods yielded favourable classification performances (Table S5), suggesting that all these subnetworks are discriminative to serve as potential subnetwork biomarkers.

| Core differential co-expression modules
Among 107 differential subnetworks, some subnetworks overlapped. Genes located in the overlapping regions were frequently captured by differential subnetworks. We speculated that these genes may play important roles in ESCC and are worth paying more attention. Thus, we applied a greedy search to identify overlapping gene modules (referred to as core differential co-expression modules) across all the differential subnetworks. Four core modules with ≥4 genes and captured by ≥3 differential subnetworks were identified, including and AL121899.1-associated core module ( Figure 4A), an ELMO2-associated core module ( Figure 5A), and a BICDL2-and KRT78-associated core module ( Figure S6). Two of them were further investigated in the following section.

| AL121899.1-associated core module
The AL121899.1-associated core module was identified in six differ- in Region I in the DCN, suggesting that hyper-methylation may have a role in differential co-expression between down-regulated genes.
As the hub of the core module, AL121899.1 is a lncRNA whose function has not been clarified. We speculated that AL121899.  (Table S6-S7, Figure S7B).

| ELMO2-associated core module
As for AL121899.1, ELMO2 is the hub of the ELMO2-associated core module. It was consistently down-regulated in the  Figure 5B). qRT-PCR assays showed that ELMO2 had low expression in ESCC cells, using 293T cells as the control ( Figure 5C), suggesting an important role in ESCC. Thus, we sought to investigate the potential functions of ELMO2. Firstly, we transfected GFP-ELMO2 into KYSE150 and KYSE510 cells, and confirmed ELMO2 expression by immunoblotting ( Figure 5D). Both wound healing assays and transwell migration assays showed that ELMO2 overexpression reduced cell migration ( Figure 5E,F). Secondly, we knocked down ELMO2 in TE3 and KYSE450 cells by siELMO2 transfection, and confirmed ELMO2 expression by qRT-PCR ( Figure 5G). In contrast to ELMO2 overexpression, ELMO2 knockdown promoted cell migration ( Figure 5H,I).
These results indicate that ELMO2 inhibits cell migration in ESCC.  (Table S10 and S11, Figure S8B). The up-regulated genes were enriched on biological processes associated with cell-cell adhesion, interferon-gamma production, acute inflammatory response and cytokine-cytokine receptor interaction ( Figure S8C). The down-regulated genes were enriched on biological processes associated with axoneme assembly and membrane repolarization ( Figure S8D).
In all, the functional experiments showed that AL121899.1 and ELMO2 are two important tumour suppressors in ESCC. This indicates that the differential subnetworks could suggest reliable targets for further study.

| D ISCUSS I ON
In this study, we investigated the topological characteristics of the DCN in ESCC. As with other cellular networks, the DCN is a scalefree, small-world network. These topological characteristics were not dependent on the method used to construct the DCN, as DCNs constructed by a different method DCe 56 with different cut-offs also had the properties of c»c random and L≈L random ( Figure S9). The high clustering coefficient implies modular organization of the network, 40 which implies the genes in the lncRNA-associated differential subnetworks may work in a modular manner to contribute to the development of ESCC.
Differential co-expression analysis has been used to improve functional enrichment analysis, 15 unveil differential regulation 16 and detect differentially co-expressed clusters globally, such as WGCNA, 17 DiffCoEx 18 and DICER. 19 Different from these studies, our subnetwork searching algorithm focuses on identifying differential co-expressed subnetworks associated with a specific node, for example a lncRNA. With the parameters d = 2, r = .1 and α = .7, the identified subnetworks had moderate sizes that were convenient for further analysis and functional validation ( Figure S2A,B).  Figure S10D-F). Another characteristic of our method is that it integrates both differential expression and differential co-expression for subnetwork identification. The relative weight of differential expression and differential co-expression is controlled by the parameter α. When α approaches 1, our method will be reduced to a differential expression-based method except for the underlying differential co-expression network (Formula (6)). The parameter α does not affect the subnetwork size much (Figure S10G-I), but increases differential expression scores and reduces differential co-expression scores when it gives more weight on differential expression ( Figure S10J,K). These results suggest that, compared to differential expression-based methods, our method focuses on identifying biologically meaningful subnetworks at the cost of some discriminability. However, it is just an indirect comparison with differential expression-based methods based on our differential co-expression network. To objectively evaluate our method, rigorous comparisons with similar methods that also incorporate differential co-expression scores are needed in the future.
Differential co-expression analysis has been reported to be able to suggest new biomarker candidates and provide novel hypotheses for specific functional experiments. 12 LncRNAs and PCGs in a same differential subnetwork may work together to perform specific functions. Except for AL121899.1 and ELMO2, whose abilities to inhibit tumour growth were confirmed by functional experiments, many other genes in the core modules have been reported as tumour suppressor genes, such as SPINK8, CST6 and C2orf54, [42][43][44] highlighting the ability of the core modules to indicate candidate cancer genes.
Differential co-expression is complementary to differential expression for depicting dysregulated systems in cancer. 12,20 Differential co-expression analysis has successfully identified driver genes that could not be found by differential expression analysis. 20,57 In the DCN, genes that were not differentially expressed also had the possibility to play a role in ESCC due to their differential connections. For example, DLC1 is a known tumour suppressor gene that may be involved in the carcinogenesis of ESCC. 58 It was captured in the DCN, but missed in differential expression analysis as it exhibits no differential expression (P = .078). In the KRT78-and BICDL2-associated core module, the two hubs were connected by 15 lncRNAs (Figure S6). Although many of them were not differentially expressed, they were differentially co-expressed with both KRT78 and BICDL2. Among these genes, lncRNA C20orf204 (named LINC00176 in GRCh37 coordinates, P = .168) has been confirmed to negatively regulate cell proliferation in Huh7.5 OC cells. 59 Tran et al found that C20orf204 regulates expression of more than 200 genes by a sponge function for tumour suppressor miRNAs in hepatocellular carcinoma. 60 Its function in ESCC is also worth further investigation. Other top-ranked genes with a high degree include TMTC1, OLFM1, TRIM31 and FGF13.
In summary, we identified a source of lncRNA-associated differential subnetworks on a large scale by integrating differential expression and differential co-expression. The functional experiments on AL121899.1 and ELMO2 confirmed the effectiveness of the subnetwork identification method. The differential subnetworks will be helpful for revealing the dysfunctional regulatory networks of ESCC and generating hypotheses for the discovery of novel gene or subnetwork biomarkers. However, identification of differential subnetworks is the first step towards understanding the dysfunctional regulatory systems in the development of ESCC. Further analyses are needed to illustrate the detailed regulatory mechanisms underlying the differential subnetworks in the future. The proposed subnetwork identification method has been implemented as an R package 'DCN' (https://github. com/weili u123/DCN-package), which can be easily used to investigate differential subnetworks of other molecules in other cancers.

ACK N OWLED G EM ENTS
This work was supported in part by the National Natural Science

CO N FLI C T O F I NTE R E S T
The authors confirm that there are no conflicts of interest.

AUTH O R CO NTR I B UTI O N S
EML and LYX conceived the concept for this study; WL designed the algorithm and performed the bioinformatic analyses; CYG and LDL carried out the biological experiments; WW and CQL performed network analysis; WL and CYG wrote the manuscript; EML and LYX revised the manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
All other data supporting the presented findings are available from the corresponding author upon request.