Identification and functional analysis of long non‐coding RNAs in the synovial membrane of osteoarthritis patients

Osteoarthritis (OA), the most common chronic joint disease in the elderly, has become a significant economic burden for families and societies worldwide. Although treatments are continually improving, current drugs only target joint pain, with no effective therapies modifying OA progression. Long noncoding RNAs (lncRNAs), which have received increasing attention in recent years, are abnormally expressed in OA cartilage. In the present study, weighted coexpression network analysis (WGCNA) was applied to identify modules related to certain OA clinical traits. In total, 4404 coding genes and 161 lncRNAs were differentially expressed based on two OA expression profile data sets and normal control samples. Subsequently, 11 independent modules were acquired, and the green module, with a total of 49 hub genes, was identified as the most relevant to OA. These hub genes were validated using the GSE12021 data set. There was only one lncRNA among the hub genes, namely, NONHSAG034351. Thus, we further explored the function of NONHSAG034351‐related genes in the network. Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that NONHSAG034351‐associated genes are involved in the response to lipopolysaccharide, angiogenesis, tumour necrosis factor (TNF) signalling, and mitogen‐activated protein kinase (MAPK) signalling pathways. In conclusion, we identified modules through WGCNA related to OA clinical traits. NONHSAG034351, the only hub‐lncRNA, was downregulated in OA synovial tissue and might play a significant role in the pathological progression of this disease. Our findings have important clinical implications and could provide novel biomarkers that indicate the molecular mechanisms of OA and act as potential therapeutic targets. Significance of this study Long noncoding RNAs (lncRNAs) have been reported to be abnormally expressed in osteoarthritis (OA), which is the most common chronic joint disease among the elderly. In the present study, we report the expression profiles of lncRNAs in OA and the identification of modules through WGCNA related to OA clinical traits. NONHSAG034351, the only hub‐lncRNA identified to be downregulated in the synovial tissue of OA patients, might play a significant role in the pathological progression of OA. Furthermore, our findings provide novel biomarkers associated with the molecular mechanisms underlying OA pathogenesis, thus implying potential therapeutic targets with important clinical implications.

Osteoarthritis (OA), the most common chronic joint disease in the elderly, has become a significant economic burden for families and societies worldwide. Although treatments are continually improving, current drugs only target joint pain, with no effective therapies modifying OA progression. Long noncoding RNAs (lncRNAs), which have received increasing attention in recent years, are abnormally expressed in OA cartilage. In the present study, weighted coexpression network analysis (WGCNA) was applied to identify modules related to certain OA clinical traits. In total, 4404 coding genes and 161 lncRNAs were differentially expressed based on two OA expression profile data sets and normal control samples. Subsequently, 11 independent modules were acquired, and the green module, with a total of 49 hub genes, was identified as the most relevant to OA. These hub genes were validated using the GSE12021 data set. There was only one lncRNA among the hub genes, namely, NONHSAG034351. Thus, we further explored the function of NONHSAG034351-related genes in the network. Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that NONHSAG034351-associated genes are involved in the response to lipopolysaccharide, angiogenesis, tumour necrosis factor (TNF) signalling, and mitogenactivated protein kinase (MAPK) signalling pathways. In conclusion, we identified modules through WGCNA related to OA clinical traits. NONHSAG034351, the only hub-lncRNA, was downregulated in OA synovial tissue and might play a significant role in the pathological progression of this disease. Our findings have important clinical implications and could provide novel biomarkers that indicate the molecular mechanisms of OA and act as potential therapeutic targets.
Significance of this study: Long noncoding RNAs (lncRNAs) have been reported to be abnormally expressed in osteoarthritis (OA), which is the most common chronic joint disease among the elderly. In the present study, we report the expression profiles of lncRNAs in OA and the identification of modules through WGCNA related to OA clinical traits. NONHSAG034351, the only hub-lncRNA identified to be downregulated in the synovial tissue of OA patients, might play a significant role in the pathological progression of OA. Furthermore, our findings provide novel biomarkers associated with the molecular mechanisms underlying OA pathogenesis, thus implying potential therapeutic targets with important clinical implications.

| INTRODUCTION
Osteoarthritis (OA) is the most common chronic joint disease in older adults and is characterized by the degeneration of articular cartilage, subchondral bone sclerosis, osteophyte formation, and synovial membrane inflammation, which causes joint pain, joint dysfunction, and disabilities. 2,7,30 Globally, 250 million people are estimated to have OA of the knee, and the incidence is increasing. 42,43 Moreover, because of medical costs and lost wages, OA has become a tremendous economic burden for families and societies worldwide. 40 Although treatment options are continually improving, current therapies cannot effectively modify OA progression. 33 The pathogenic mechanism of OA is complicated and multifactorial, being related to genetic, biological, and immunologic factors. 20 Recent studies have shown an association between epigenetics and OA, in which noncoding RNAs (ncRNAs) were found by microarray analysis to participate in the processes of cartilage and bone development. 6,37 Long ncRNAs (lncRNAs), comprising 200-nucleotide long ncRNAs, are mRNA-like molecules and widely transcribed in mammalian genomes. 10 In the past decade, lncRNAs have garnered increasing attention and have been confirmed to participate in different biological processes, serving as signals, decoys, guides, and scaffolds. 45 Studies have also shown that lncRNAs are abnormally expressed in various human diseases and affect their development. 32 For example, lncRNA-HOTAIR and lncRNA-CIR are aberrantly expressed in OA cartilage, suggesting that they might have significant effects on the diagnosis and prognosis of OA and could be used as personalized therapeutic biomarkers to indicate OA progression. 3,14,26 However, current studies have mainly focused on a single genetic event or a single cohort. The lncRNA expression profiles and related mRNA expression patterns, as well as their potential biological functions in OA, remain unclear.
Recently, novel bioinformatics and computational technologies have been widely applied to human diseases. 50 Weighted coexpression network analysis (WGCNA) is a systematic biological approach to identify the relationship between genes based on microarray or RNAseq data. 19,21 It explores the complex relationships between gene expression and clinical traits by transforming the gene expression data into a coexpression module. 44 WGCNA has been widely used to investigate different biological processes and has identified several potential biomarkers and therapeutic targets. 35,56 In this study, we integrated two datasets, GSE55235 and GSE55457, from the Gene Expression Omnibus (GEO) database and performed WGCNA to identify useful modules and potential lncRNAs related to OA. Further, Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the coexpressed genes in the most significant module was performed to explore the biological functions of the identified lncRNA. Our study aimed to construct a network between coexpressed genes based on profile datasets from multiple cohorts, which might promote the identification of important modules and lncRNAs related to OA to provide potential therapeutic biomarkers for its clinical treatment.

| Data collection
The raw data sets used in this study were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). Data sets GSE55235, GSE55457, and GSE12021 were obtained using the Affymetrix Human Faculty of the University of Leipzig), and informed patient consent was obtained. 49 All three data sets were from the same multicenter study and were generated by different laboratories. Because the GSE55584 data set does not contain any control samples, only data sets GSE55235 and GSE55457 were selected for further analysis. The data sets GSE55235 (10 OA and 10 normal control synovial tissues) and GSE55457 (10 OA and 10 normal control synovial tissues) were used to construct the gene coexpression network and identify genes related to OA. These identified hub genes were validated using the data set GSE12021 (which included 10 OA and 9 normal control synovial tissues), another independent data set from a study by Huber et al. 16

| Array probe annotation and differential expression analysis
In this study, a customized reannotated chip-description-file (CDF) of HG-U133A microarrays was generated as previously described. 24 In brief, the probe sequences of HG-U133A microarrays provided by Affymetrix were aligned to lncRNA transcript sequences (from NONCODEv5) and to the coding gene transcript sequences (from the RefSeq database 36 ) using BLASTn. Criteria for transcripts were as follows: (1) only perfectly matched probe-transcript pairs were retained; (2) probes were removed by targeting both lncRNA and coding transcripts; (3) all transcript sequences corresponding to the retained probetranscript pairs were mapped to the hg19 reference genome and annotated at the gene level; (4) both coding and lncRNA genes detected by ≥4 probes were included. Then, a reannotated CDF package (corresponding to the old CDF package HG-U133A.cdf provided by Affymetrix) was generated using the "makecdfenv" R package.
Using this new CDF file, a total of 9775 coding and 526 lncRNA genes were annotated. The Robust Multiarray Average (RMA) algorithm in the Affy package was used to correct for background, as well as for log 2 transformation and quantile normalization of our data sets. The limma Bioconductor R package was employed to determine the differentially expressed genes between OA and normal control synovial tissues by one-way analysis of variance (ANOVA), and the false discovery rate (FDR) was corrected by the Benjamini-Hochberg method. Using the differential gene expression data (P value <.05), all samples were clustered through the Euclidean distance method, and unaggregated samples were removed from the cluster. Finally, GSM1337307 from GSE55457 was removed. Genes with a fold change >1.5 and a P value <.05 were defined as significant differentially-expressed genes.

| Construction of a lncRNA-mRNA weighted coexpression network
The coexpression network for coding and lncRNA genes was constructed using the "WGCNA" R package. 21 We first selected genes with a P value <.05 for a two-sample t-test between OA and control samples. In total, 4404 coding genes and 161 lncRNA genes were selected to construct the coexpression network. Then, we removed the batch effect between GSE55235 and GSE55457 using the "remove Batch Effect" function in the limma R package before constructing the coexpression network. Coexpression network construction was performed using the "blockwiseModules" function in the WGCNA package. Briefly, we first calculated the Pearson's correlation coefficient for all possible gene pairs. A weighted adjacency matrix (WAM) was subsequently constructed using a power function, which resulted in a weighted network. After sensitivity analysis of scale-free topology, the soft-thresholding power parameter, β, was set to 12. A topological overlap matrix (TOM) and the corresponding dissimilarity (1-TOM) were calculated based on the WAM.

| Identification of clinically significant modules and hub genes
To identify clinically significant modules associated with OA, two methods were used. First, we calculated the log 10  In addition, hub genes were also genes that were significantly differentially expressed.

| Functional analysis of lncRNA
The potential functions of the identified lncRNA were predicted based on the functional analysis of coexpressed genes in the green module.
GO enrichment and KEGG pathway analyses were performed using the "clusterProfiler" R package. 52 3 | RESULTS

| Data preprocessing and co-expression network construction
OA and normal control synovial tissue gene expression profiles of GSE55235, GSE55457, and GSE12021 were obtained from the GEO database. After background correction and quantile normalization of the raw data, the gene expression profiles were calculated using our reannotated CDF file. Next, differentially expressed genes between F I G U R E 2 Identification of modules associated with osteoarthritis (OA). A, Dendrogram of 4565 coding/lncRNA genes clustered based on a dissimilarity measure (1-TOM). B, Bar plot of module significance defined as the mean gene significance (GS) across all genes in the module. The green module was the most promising module associated with OA. C, Heatmap of the correlation between module eigengenes (MEs) and OA the OA and normal control synovial tissues were identified using the limma R package. After removal of the batch effect (Supporting Information Figure S1), a total of 4565 differentially expressed genes-4404 coding and 161 lncRNA genes-with a P value <.05 in GSE55235 or GSE55457 were used to construct a weighted gene coexpression network using the "WGCNA" package in R. The softthresholding power of β = 12 was selected to ensure a scale-free network ( Figure 1A,B). An average linkage hierarchical clustering tree (dendrogram) was then constructed, and 11 modules with various colours were identified, which can be seen from the coloured-band underneath the cluster tree (Figure 2A). To explore significant modules associated with OA, we employed two methods (see MATE-RIALS AND METHODS) that enabled the evaluation of the relationship between each module and OA. We found that the mean GS of the green module was higher than that of any other module ( Figure 2B). The ME in the green module also showed a higher correlation with clinical traits than any other module (R 2 = 0.86, P = 1e −12; Figure 2C). In addition, the correlation between GS and MM in the green module was the most significant ( Figure 3, Supporting Information Figure S2). Thus, the green module (Supporting Information Table S1) was identified as the most related module to OA.

| Identification of hub genes in the synovial tissue of OA patients
Genes with high MM (weighted correlation >0.8) and GS > 0.2 in the green module were extracted. Then, significant differentially expressed genes in both GSE55235 and GSE55457 data sets were identified.
Based on the threshold of >1.5-fold change and P value <.05, we identified 784 differentially expressed genes from the expression profile in the GSE55235 data set, among which 401 were upregulated and 383 were downregulated ( Figure 4A, Supporting Information Table S2).
In the same way, 403 differentially expressed genes were identified from the GSE55457 data set, among which 184 were downregulated and 219 were upregulated ( Figure 4B, Supporting Information Table S3). Further, through integrated bioinformatics analysis, a total of 179 consistently expressed genes were identified from the two profile data sets ( Figure 4C, Supporting Information Table S4). In addition, the clusters were significantly separated between the OA group and normal control group, and the expression levels of these genes in the two data sets were significantly different between the two groups ( Figure 4D,E). Finally, only the significant differentially expressed genes with MM > 0.8 and GS > 0.2 were identified as hub genes. A total of 49 hub genes were identified in the green module (Table 1). Among these, the top five hub genes and the only lncRNA, NONHSAG034351, were all downregulated in the synovial tissue of OA patients compared with their expression in normal samples in both the GSE55235 and GSE55457 data sets. To further confirm the reliability of the six identified hub genes from the two data sets, we downloaded an additional data set, GSE12021, and found that all six downregulated genes identified in this study were also significantly downregulated in the GSE12021 OA samples ( Figure 5).

| Functional analysis of the lncRNA NONHSAG034351
The only lncRNA, NONHSAG034351, among the 49 hub genes in the green module attracted our attention. There were 48 genes with connectivity to NONHSAG034351, which were regarded as potential targets and coexpressed genes ( Figure 6). The biological effects of lncRNAs are mainly mediated by the regulation of their coexpressed or target genes. Therefore, we performed KEGG pathway and GO enrichment analysis on the coexpressed genes of the green module to further explore the biological functions of NONHSAG034351.
Among the biology processes, lncRNA NONHSAG034351 mainly participated in the response to lipopolysaccharide (LPS), rhythmic process, response to molecules of bacterial origin, and angiogenesis ( Figure 7A, Supporting Information Table S5). The KEGG pathway analysis showed that lncRNA NONHSAG034351 mainly participated in  Table S6).

| DISCUSSION
OA is a well-known form of osteoarthrosis and is becoming an increasingly severe public health problem, whose underlying mechanism is not fully understood. 55 Clarifying the molecular events during  Thus, in the present study, we integrated two data sets, GSE55235 and GSE55457, from the NCBI-GEO database and utilized bioinformatics methods to analyse them thoroughly. We performed WGCNA to identify important modules and potential lncRNAs related to OA. WGCNA is a method widely used to cluster possible genes into different gene sets/modules. 27,41 The distinct advantage of WGCNA is that it focuses on the correlation between modules and clinical characteristics and has higher biological significance and reliability. 5 In the current study, we found 11 separate modules and ultimately iden-   through the regulation of their coexpressed or target genes. 44 We thus performed KEGG pathway and GO enrichment analysis of coexpressed genes in the green module to further investigate the biological functions of NONHSAG034351. GO analysis showed that biology processes such as the response to LPS, rhythmic process, response to molecules of bacterial origin, and angiogenesis were enriched in the green module. Among these, LPS has been reported to induce inflammation and has been used to induce arthritis in conjunction with collagen in experimental animal models. 28 Thus, in our study, we concluded that NONHSAG034351 might take part in the biological process of LPS-induced inflammation in OA; however, the related mechanism requires further investigation. In addition, we found that angiogenesis was enriched in the green module. Many previous studies have demonstrated the importance of angiogenesis in OA and that inhibition of angiogenesis could be a novel therapeutic approach to reduce pain and inflammation associated with OA. 17,31 For example, Leucine-rich alpha-2-glycoprotein (LRG1) increases angiogenesis and recruits mesenchymal stem cells in the subchondral bone of OA joints to assist in angiogenesis-coupled de novo bone formation, whereas the suppression of TNF-α and LRG1 by lenalidomide could be an effective therapeutic approach for OA. 46 Moreover, lncRNA maternally expressed 3 (MEG3) is decreased in OA and inhibits angiogenesis via p53 pathways. 22 However, whether NONHSAG034351 participates in the process of angiogenesis in OA still needs to be investigated.
According to the KEGG pathway analysis, coexpressed genes in the green module mainly participate in the TNF, IL-17, NOD-like receptor, and MAPK signalling pathways. These pathways are mainly involved in inflammation and the progression of OA. 1,4,11,13 The MAPK signalling pathway, one of intracellular serine-threonine protein kinase superfamily members, is the centre of multiple signal transduction pathways. 18 It has been suggested that the MAPK pathway activation results in the overexpression of proinflammatory cytokines, F I G U R E 6 An integrated network of regulatory relationships between long noncoding RNAs (lncRNA) NONHSAG034351 and targeted coding genes in the green module chemokines, and signalling enzymes in human OA chondrocytes, which play a significant role in the progression of OA. 4 Blockage or inhibition of the activation of MAPK signalling can inhibit inflammatory cytokine production in OA chondrocytes to prevent the degradation of bone and cartilage. 39 The TNF signalling pathway is another important pathway in the regulation of inflammation. Studies have shown that miR-17 overexpression suppresses TNF receptorassociated factor 2 (TRAF2) expression and is correlated with cellular inhibitor of apoptosis 2 (cIAP2), thus suppressing TNF-α signalling pathways and downstream inflammatory proteins. 1 In classical TNF-α signalling, TNF receptor 2 (TNF-R2) activates the NF-κB and MAPK pathways. 29 In addition, the MAPK and TNF-α signalling pathways have also been reported to participate in angiogenesis. 11