Genome‐wide data mining to construct a competing endogenous RNA network and reveal the pivotal therapeutic targets of Parkinson's disease

Abstract Parkinson's disease (PD) is one of the most common neurodegenerative movement disorders, for which there has been no effective treatments. To clarify the pathogenesis of PD, we constructed a competing endogenous RNA (ceRNA) network based on the genome‐wide RNA sequencing data. It was found that 92 RNAs were differentially expressed, including 50 mRNAs, 25 miRNAs and 17 lncRNAs, based on which a ceRNA network was constructed and evaluated from 4 aspects of number of nodes, topological coefficients, closeness centrality and betweenness centrality. The functional annotation and enrichment analysis suggested that 6 functional modules, particularly the peripheral nervous system development and toxin metabolic process, dominated the development of PD. To validate the assumption, the gene set enrichment analysis (GSEA) was conducted basing on the genome‐wide RNAs regardless whether they were differentially expressed or not. Consistently, the results revealed that dysregulation of MAG, HOXB3, MYRF and PLP1 led to metabolic disorders of sphingolipid and glutathione, which contributed to the pathogenesis of PD. Also, in‐depth mining of previous literature confirmed a pivotal role of these dysregulated RNAs, which had been indicated to be potential diagnostic and therapeutic biomarkers of PD. Overall, we constructed a ceRNA network based on the dysregulated mRNAs, lncRNAs and miRNAs in PD, and the aberrant expression of MAG, HOXB3, MYRF and PLP1 caused metabolism disorder of sphingolipid and glutathione, and these genes are of great significance for the diagnosis and treatment of PD.

the treatment of PD mainly focuses on dopamine (DA) replacement and tremor relief, but there are many side effects such as the on-off phenomenon. 3 Moreover, these therapeutic strategies cannot prevent the development and evolution of neurodegeneration, which seriously affects the life quality of PD patients. On the other hand, available epidemiological studies reveal that 2.0% of people over 65 years old are affected by PD, and its morbidity is 1.5% higher in men than that of women. 4 Worldwide, the incidence rate of the newly diagnosed PD is estimated to be 8 to 18 per 100,000 people yearly. 5 The conditions are even less optimistic in primarily industrialized countries, where most of the PD patients are sporadic with less than 10% of family history, and the estimated PD prevalence ranges from 0.3% to 3.0% with the increase of age. For the highly industrialized regions such as Europe, the estimated PD prevalence is 65 to 12, 500 per 100,000 people, and the newly diagnosed PD cases are 346 per 100, 000 people every year. 6 Considering the expanding population of PD, as well as the defects of the current therapeutic strategies, it is urgent to identify the potential therapeutic targets that will help to clarify PD pathogenesis.
Over the past 30 years, many standards and guidelines have been applied to optimize the diagnosis of PD. However, at current stage, it still needs opinions from the movement disorder specialists to determine the diagnosis. 7,8 Biomarkers are biochemical indicators reflecting changes of various physical disorders, and they can be objectively measured, evaluated and applied to the diagnosis of many diseases. 9 Also, biomarkers are used to predict the occurrence, development and prognosis of diseases, and they play pivotal roles in indicating pharmacological response to specific treatment and intervention. 10,11 Given the current strategy for PD diagnosis, which mainly depends on clinical manifestations and often confounded by the atypical symptoms, identification of specific diagnostic biomarkers is of great significance for the diagnosis and treatment of PD.
Findings obtained in previous studies provide great help for screening potential diagnostic biomarkers of PD. It is found that the development of PD is a complicated and sophistically regulated biological process, in which mRNAs, miRNAs and lncRNAs are commonly investigated. miRNAs are dozens of highly conservative non-coding small RNAs, and they can inhibit protein synthesis by interacting with protein-coding or limited protein-coding RNAs at the post-transcriptional level. Several miRNAs have been indicated as potential biomarkers of PD. 12 For example, upregulation of miR-133b leads to silence of lncRNA SNHG14, whereas inadequate SNHG14 suppresses the expression of α-syn that initiates the pathogenesis of PD. 13 Considering the regulatory mechanism between mRNAs, miR-NAs and lncRNAs, a competing endogenous RNA (ceRNA) network can be constructed, which facilities a comprehensive understanding of the pathogenesis of PD. In 2011, Salmone et al first proposed the hypothesis of the ceRNA network, and the hypothesis is based on the competition of a finite pool of miRNAs between different RNAs. 14,15 Given the base-paring model between different RNAs, the ceRNA network is predicted by sequence alignment. Up to date, the ceRNA network has been applied to the diagnosis and treatment of numerous human diseases, especially for cancer and atherosclerosis. [16][17][18] However, it has not been reported in PD.
Overall, we constructed a ceRNA network based on the genome-wide RNA sequencing data of lncRNAs, miRNAs and mRNAs.
The samples used to profile the RNA expression were obtained from the Gene Expression Omnibus (GEO), and a ceRNA network was built and evaluated using a statistical model. Besides, the biological function dominated by the ceRNA network was predicted from two aspects of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG).
Also, the gene set enrichment analysis (GSEA) was implemented to reveal the molecular mechanism related to the functional changes of PD.

| Data source
All foundational data were downloaded from GEO, where the microarray and sequence-based RNA profiles are free to access and available with no restrictions for research (https://www.ncbi.nlm.nih.gov/ geo/). RNAs with abundance less than four degrees were excluded.

RNAs (DERNAs)
The differentially expressed mRNAs, miRNAs and lncRNAs were determined by Limma test, and a P-value less than 0.05 and | log 2 (fold change) | ≥ 1 were applied to screen DERNAs. Three heatmaps regarding the expression of these DERNAs were plotted using the pheatmap package in R software.

| Construction of the mRNA-DEmiRNA-lncRNA triple network
According to the hypothesis of the ceRNA network, we examined all potential connections among the aberrantly expressed mRNAs, miR-NAs and lncRNAs. Using miRNA as a bridge, we linked the DEmRNAs and DElncRNAs by searching the miRNA complementary sequences, and a triple network among DEmiRNAs and their targeted lncRNAs and mRNAs was constructed. The target genes of the DEmiRNAs were predicted using the microT scoring method (paring score ≥ 0.8) via the DIANA platform, and lncRNAs regulated by the DEmiRNAs were identified through alignment with lncBase Version 2.0. The Pearson correlation coefficient between lncRNAs and mRNAs was calculated, and only the pairs with coefficients more than 0.4 were involved in the triple network. Finally, a topological network of mRNAs-DEmiRNAs-lncRNAs was visualized using Cytoscape Version 3.7.1.

| Construction of the ceRNA network
To construct the ceRNA network, we extracted the DEmRNAs and DElncRNAs targeted by DEmiRNAs from the triple network. By adopting the hypergeometric test, we evaluated the significance of the shared miRNAs between each DEmRNA and DElncRNA. A Pvalue was calculated as follows and considered as statistically significant if it was less than 0.05: where K is the number of miRNAs interacted with mRNAs, and N is the number of miRNAs interacted with lncRNAs. M is the number of miR-NAs in the genome, and x is the number of miRNAs shared by mRNAs and lncRNAs. 20

| Fitness assessment of the ceRNA network
The topological analysis of the ceRNA network was carried out by using the NetworkAnalyzer plugin in Cytoscape version 3.7.1. 21 The validity of the model was studied from 4 aspects of number of nodes, topological coefficients, closeness centrality and betweenness centrality. The number of nodes was the number of edges that linked to this node. The topological coefficient was a relative measure for the extent to which a node shared neighbours with other nodes. Closeness centrality showed the connection of a node with other nodes using the standardized inverse average distance. Betweenness centrality was a measure of node in a network, which was the number of the shortest paths from each node to all others that passed through the node. 22,23 A P-value less than 0.05 was considered as statistically significant.

| Statistical analysis
All statistical analyses were performed using R software version 3.6.1 for Windows. The difference of RNA expression was compared between PD and the healthy control using Limma test. A Benjamin-Hochberg adjusted P-value < 0.05 was as considered statistically significant unless otherwise indicated.

| Construction and fitness assessment of ceRNA Network in PD
To construct the ceRNA network in PD, we extracted DEmiRNAs, DEmRNA-DElncRNA-DEmiRNA interactive pairs were identified and are shown in Figure 2 and Table 2. Moreover, we performed the topological assessment to evaluate the fitness of the ceRNA network at node degree, topological coefficient, closeness centrality and betweenness centrality. As shown in Figure 3, the square coefficients of node degree, topological coefficient, closeness centrality and betweenness centrality were 0.827, 0.936, 0.690 and 0.688, indicating that a highly convincing ceRNA network was successfully constructed.

| Functional annotation and enrichment analysis of the ceRNA network and identification of the hub genes in PD
To explore the biological function dominated by the ceRNA network, we implemented functional annotation for the RNAs in

F I G U R E 4
Functional annotation and enrichment analysis for RNAs involved in the ceRNA network. The total of nine functional modules was represented in different colours, and the nervous system-related functional modules, along with their first neighbourhoods in the topological network, were painted with the yellow background

| Validation of the crucial role of MAG/ HOXB3/MYRF/PLP1-related ceRNA network in the pathogenesis of PD
To study the functional implication of the 8 hub genes in the ceRNA network, the functional annotation and enrichment analysis were performed. As shown in Figure 5 and Supplementary

| D ISCUSS I ON
PD is one of the most common multifactorial neurodegenerative disorders, which mainly occurs in middle-aged and elder people over 65 years old. 24,25 It is generally believed that PD is caused by genetic alteration, age and environment, 26 yet the pathogenesis of PD has not been fully elucidated, which consequently calls into questions for its diagnosis and clinical treatments. With the rapid development of transcriptomics, the RNA interaction in PD has drawn considerable attentions. For example, it was found that TRPC6 in the blood was significantly suppressed in patients with mild cognitive impairment and could be used as a potential indicator for PD or Alzheimer's disease. 27 Similarly, 8 miRNAs, including miR-9-5p, miR-21-5p, the miR-29 family, miR-132-3p, miR-124-3p, miR-146a-5p, miR-155-5p and miR-223-3p, have been well-studied in PD or PD models. 28 More importantly, the dysregulation of miR-133b by interfering lncRNA SNHG14 alters the α-synuclein pathway and lead to mitigation of the dopaminergic neuron injury in PD. 13 Therefore, a comprehensive investigation of the RNA interactions is conducive to the mechanistic study as well as the clinical treatment of PD. Here, we recombined a data set using publicly available data sets of mRNAs, lncRNAs and miRNAs in PD, and then, we constructed a triple network that was further used to build the ceRNA network (DEmRNA-DEmiRNA-DElncRNA). The functional enrichment analysis revealed the potential regulatory mechanism of PD, where the signalling pathways of myelination and the formation and development of glial cells were previously confirmed to promote PD. 29 Specially, we evaluated the fitness of the ceRNA network using R-squares of the number of nodes, topological coefficient, closeness centrality and betweenness centrality. We found that the ceRNA network was of excellent statistical efficiency. Finally, a set of hub genes, including MAG, HOXB3, MYRF and PLP1, were determined using GSEA and predicted to mediate the sphingolipid and glutathione metabolism signalling pathways. Glial cells are widely distributed in the central nervous system, which supports and nourishes neurons, as well as absorbs the active substances, so the dysfunction of these biological activities is essential for the development of PD. 31 For the hub genes in the ceRNA network, they were both validated by the GSEA and previous experiments. By inducing apoptosis, MAG was suggested to regulate the growth of cerebellar granule neurons (CGNs). 32 As a member of the HOXB family, HOXB3 was in charge of cell differentiation and proliferation. 33 Besides, MYRF and PLP1 were essential for the oligodendrocytes (OLS) differentiation and myelin maintenance in the central nervous system, which was tightly associated with long-distance and rapid transmission of nerve electrical impulses. [34][35][36] Consistently, we found that the signalling pathways such as myelination and the biosynthe- In conclusion, we constructed a highly convincing ceRNA network based on the genome-wide expression profiles of mRNAs, miRNAs and lncRNAs. The functional enrichment analysis and GSEA showed that dysregulation of MAG, HOXB3, MYRF and PLP1, as well as their corresponding miRNAs and lncRNAs in the ceRNA network, contributes to the pathogenesis of PD via sphingolipid and glutathione metabolism signalling pathway, and these RNAs of interest were potential diagnostic and therapeutic targets of PD.

ACK N OWLED G M ENTS
We would like to thank those who provided evidences for our work.

CO N FLI C T S O F I NTE R E S T
The authors confirm that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in the Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/ geo/, reference number GSE11 0716, GSE 95 427 and GSE11 0719, and the supplementary material of this article.