Identification of differentially expressed genes and signaling pathways in chronic obstructive pulmonary disease via bioinformatic analysis

Chronic obstructive pulmonary disease (COPD) is a multifactorial and heterogeneous disease that creates public health challenges worldwide. The underlying molecular mechanisms of COPD are not entirely clear. In this study, we aimed to identify the critical genes and potential molecular mechanisms of COPD by bioinformatic analysis. The gene expression profiles of lung tissues of COPD cases and healthy control subjects were obtained from the Gene Expression Omnibus. Differentially expressed genes were analyzed by integration with annotations from Gene Ontology and Kyoto Encyclopedia of Genes and Genomes, followed by construction of a protein‐protein interaction network and weighted gene coexpression analysis. We identified 139 differentially expressed genes associated with the progression of COPD, among which 14 Hub genes were identified and found to be enriched in certain categories, including immune and inflammatory response, response to lipopolysaccharide and receptor for advanced glycation end products binding; in addition, these Hub genes are involved in multiple signaling pathways, particularly hematopoietic cell lineage and cytokine‐cytokine receptor interaction. The 14 Hub genes were positively or negatively associated with COPD by wgcna analysis. The genes CX3CR1,PTGS2,FPR1,FPR2, S100A12,EGR1,CD163, S100A8 and S100A9 were identified to mediate inflammation and injury of the lung, and play critical roles in the pathogenesis of COPD. These findings improve our understanding of the underlying molecular mechanisms of COPD.

Chronic obstructive pulmonary disease (COPD) is a multifactorial and heterogeneous disease that creates public health challenges worldwide. The underlying molecular mechanisms of COPD are not entirely clear. In this study, we aimed to identify the critical genes and potential molecular mechanisms of COPD by bioinformatic analysis. The gene expression profiles of lung tissues of COPD cases and healthy control subjects were obtained from the Gene Expression Omnibus. Differentially expressed genes were analyzed by integration with annotations from Gene Ontology and Kyoto Encyclopedia of Genes and Genomes, followed by construction of a protein-protein interaction network and weighted gene coexpression analysis. We identified 139 differentially expressed genes associated with the progression of COPD, among which 14 Hub genes were identified and found to be enriched in certain categories, including immune and inflammatory response, response to lipopolysaccharide and receptor for advanced glycation end products binding; in addition, these Hub genes are involved in multiple signaling pathways, particularly hematopoietic cell lineage and cytokine-cytokine receptor interaction. The 14 Hub genes were positively or negatively associated with COPD by WGCNA analysis. The genes CX3CR1, PTGS2, FPR1, FPR2, S100A12, EGR1, CD163, S100A8 and S100A9 were identified to mediate inflammation and injury of the lung, and play critical roles in the pathogenesis of COPD. These findings improve our understanding of the underlying molecular mechanisms of COPD.
Abbreviations ARG1, arginase 1; BP, biological process; CC, cellular component; COPD, chronic obstructive pulmonary disease; CS, cigarette smoke; CX3CR1, C-X3-C motif chemokine receptor 1; DALY, disability-adjusted life year; DEG, differentially expressed gene; EGR1, early growth response 1; FC, fold change; FGG, fibrinogen gamma chain; FPR1, formyl peptide receptor 1; FPR2, formyl peptide receptor 2; GO, Gene Ontology; GS, gene significance; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function; MM, module membership; ORM1, orosomucoid 1; PPBP, proplatelet basic protein; PPI, protein-protein interaction; PTGS2, prostaglandin-endoperoxide synthase 2; S100A12, S100 calcium binding protein A12; S100A8, S100 calcium binding protein A8; S100A9, S100 calcium binding protein A9; VCAM1, vascular cell adhesion molecule 1; WGCNA, weighted gene coexpression network analysis; YLLs, years of life lost. Chronic obstructive pulmonary disease (COPD), characterized by long-term poorly reversible airway limitation and persistent respiratory symptoms, is a common and preventable disease [1]. COPD is projected to become the third leading cause of all death by 2030 in the world [2]. Globally, COPD affected 299.4 million people in 2017, with a 71.2% increase in the prevalence rate compared with 2015, ranking it as the fifth leading cause of disability-adjusted life years (DALYs) and the seventh leading noncommunicable disease cause of years of life lost (YLLs) [3][4][5][6]. As shown in Fig. 1A, we observed a 12.3% increase in global all-age deaths caused by COPD from 2.85 million in 1990 to 3.20 million in 2017 [6][7][8][9] and a predicted increase of 60% by 2020 compared with 1990. Figure 1B indicated that the all-age standardized death rate of COPD in males, females, and both sexes separately decreased from 1990 to 2015 [6,8], which could be because of population growth and aging. Although the COPD death rate varies with different countries, more than 90% of COPD deaths occurred in low-and middle-income countries [10]. The global all-age YLLs with COPD showed a small increase of 7.5% and 3.6% for both sexes and males, respectively, as well as a 21% decrease for females from 1990 to 2015 (Fig. 1C) [6,8]. In addition, as shown in Fig. 1D [3-5,7,11-,14], global all-age DALYs caused by COPD had a small increase of 4.2% during 1990-2015 and was projected to decline to 57.6 million by 2020. The age-standardized DALY rate caused by COPD in females was about twice as high as that of males, and that in low-and middle-income countries was 6.7 times higher than in some high-income countries [3]. We observed that the global all-age years lived with disability caused by COPD has grown 52.2% from 1990 to 2017. Taken together, COPD has presented a global public health challenge with high prevalence, mortality and disability rates, whereas the diagnosis of COPD is usually made based on spirometry values and clinical symptoms with a frequent underdiagnosis [15]. Thus, it is important to explore the underlying molecular mechanisms and identify more effective early diagnostic methods and reliable biomarkers for this disease.
As a large-scale and efficient technique for acquiring genomic data, microarray-based gene expression profiles have been widely used to seek new insights for biomarkers in many human diseases [16]. Currently, many studies have been conducted on COPD gene expression profiles, and these studies have screened thousands of differentially expressed genes (DEGs) implicated in the development and progression of this disease [17,18]. However, the results for the identification of DEGs are discrepant among these studies due to sample heterogeneity and differences in technological detection platforms. In this study, we performed an integrated analysis on some of the gene expression profiling data based on lung tissues of COPD cases and control subjects using an unbiased approach aiming to identify the potential molecular mechanisms and biomarkers for COPD.
We selected two Gene Expression Omnibus microarray datasets on COPD (GSE27597 and GSE106986). DEGs were identified by R software (Auckland, New Zealand) and subsequently analyzed using bioinformatic methods including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments and construction of protein-protein interaction (PPI) and weighted gene coexpression analysis (WGCNA) networks. We screened the DEGs for potential association with the development and progression of COPD. Our work may further the understanding of the potential molecular mechanisms of COPD.

Data preprocessing and screening for DEGs
The probe set IDs of two datasets were converted into gene symbols using the R software and annotation packages. The two datasets were merged into one array dataset and then batch normalized using R packages (sva and limma 3.40.6). The DEGs between COPD cases and control subjects were identified using the limma package in R 3.60. A P-value <0.05 after being adjusted by false discovery rate and |log2FC > 1, where FC represents fold change, were applied together as the cutoff for DEGs screening.

Construction of the PPI network
STRING database (https://string-db.org/) is frequently used for identifying the protein interactions [16,24].
STRING database contains huge amounts of experimental and text mining data [25]. CYTOSCAPE is an open source bioinformatic software platform used for integrating gene expression profiles and visualizing molecular interaction networks. CYTOSCAPE plugin cytoHubba provides multiple topological analysis methods on Hub genes, regulatory networks and drug targets for experimental biologists [26]. In this study, we used STRING database to identify the interactions between the identified DEGs. A confidence score >0.4 was used as the cutoff criterion. In addition, Hub DEGs were identified with CYTOSCAPE version 3.6.1 (1999 Free Software Foundation, Inc., Boston, MA, USA) with cytoHubba plugin according to the rank of connection degree (number) for each gene, which is represented by the different degrees of color (from red to yellow): the role of the gene is greater in the PPI network with the darker color of the gene [26].

WGCNA on COPD
WGCNA may be used for screening modules (clusters) of highly correlated genes and for calculating module membership (MM) measures, in which the module eigengene or an intramodular Hub gene is used to summarize such modules, and eigengene network methodology is used for relating modules to one another and to external sample traits [27]. In this study, the WGCNA package was used to identify coexpression modules for the merged and standardized array datasets (GSE27597 and GSE106986). In brief, first, a weighted adjacency matrix containing pairwise connection strengths was constructed based on the selected soft threshold power (b = 11) on the matrix of pairwise correlation coefficients. Then, the connectivity measure per gene was calculated by summing the connection strengths with other genes; modules were defined as branches of a hierarchical clustering tree by using a dissimilarity measure, and each module was assigned a color. Subsequently, module preservation R function was used to assess the module structure preservation. Finally, the module eigengene was used for summarizing the gene expression profiles of each module, and each module eigengene was regressed on case trait (COPD) and smoking status by using the linear model in the limma R package.

Identification of DEGs in COPD
After batch normalization on the integrated dataset from GSE27597 and GSE106986 by the sva and limma packages, 139 DEGs were identified using the limma package (corrected P < 0.05; |log2FC| > 1) (Tables 1  and 2). The cluster heatmap of the top 139 DEGs is shown in Fig. 2. Among them, 62 genes were up-regulated and 77 genes were down-regulated, which is shown in Fig. 3.

GO enrichment analysis of DEGs
GO analysis was done on the DEGs against BP, MF and CC terms. Biological annotation of the DEGs with COPD was identified using the DAVID online analysis tool. As shown in Fig. 4, GO functional enrichments of the DEGs with a P-value <0.05 were obtained. Significant results of the GO enrichment analysis of the DEGs associated with COPD were shown in Table 3. In the BP category, the DEGs were mainly involved in inflammatory response, immune response and response to lipopolysaccharide. In the MF category, the DEGs were mainly enriched in receptor activity and receptor for advanced glycation end products (RAGE) receptor binding. In the CC category, the DEGs were mainly involved in the extracellular region and space, the integral component of the plasma membrane, the plasma membrane and the external side of the plasma membrane (Fig. 5).

KEGG pathway analysis of DEGs
We analyzed the DEGs using the KOBAS 3.0 online analysis database. As shown in Table 4, the DEGs were enriched in 48 pathways, especially hematopoietic cell lineage and cytokine-cytokine receptor interaction. The genes and pathway nodes are represented by CY-TOSCAPE version 3.6.1 software (1999 Free Software Foundation, Inc., Boston, MA, USA) that was used to calculate the topological characteristics of the network and determine each node (Fig. 6).

WGCNA network construction in lung tissues
A WGCNA network was first constructed using lung tissue expression data from cohorts GSE27597 and GSE106986, independent of COPD status and smoking status (ever/current smoking versus nonsmoking). A total of 2942 DEGs were selected (a corrected P < 0.05) and subsequently used to identify modules of coexpressed genes using a hierarchical clustering procedure. The corresponding branches of the resulting dynamic clustering tree and module are shown as colored bands underneath the cluster tree. We then merged the highly similar dynamic clustering modules into the merged dynamic modules (cut height = 0.25) (Fig. 9). We identified nine modules ranging in size from 113 genes in the Purple module to 1081 in the Grey module. A module eigengene, a weighted average of the module gene expression profiles, was used to summarize the expression profiles of transcripts in a given module through their first principal component.

Coexpression modules associated with COPD
To pinpoint modules associated with COPD and smoking status, we analyzed the association of each of the nine module eigengenes with the two traits. As shown in Fig. 10 and Table 5, all nine modules were significantly correlated with COPD and smoking status. Four modules were negatively associated with COPD and smoking status, marked Tan, Brown, Blue and Cyan, indicating that genes in these modules were predominantly down-regulated in COPD cases and those who had a history of smoking. However, five modules, in Green yellow, Purple, Black, Red and Grey, were positively associated with COPD cases and smoking status, showing that genes in these modules are predominantly up-regulated with the traits. Four of these nine gene modules, in Cyan, Purple, Red and Grey, attracted our attention in that 14 Hub genes were identified as DEGs from the PPI analysis, including CX3CR1, PPBP, PTGS2, FPR1, FPR2, S100A12, ARG1, EGR1, CD163, VCAM1, FGG, ORM1, S100A8 and S100A9. We calculated gene significance (GS) versus each MM. We found that the 14 Hub genes were also either positively or negatively associated with COPD (Table 6). CX3CR1, PPBP, PTGS2, VCAM1, S100A12, ARG1, EGR1, CD163, S100A8 and S100A9 were significantly associated with each MM, whereas FPR1, FPR2 and ORM1 were correlated with each MM except Red MM, and FGG was correlated to each MM except the Purple MM. In addition, we found that the Purple (CX3CR1), Red (EGR1, VCAM1 and PTGS2) and Grey (ARG1, FGG, and PPBP) genes most significantly correlated with GS for COPD were also the important MM elements (Fig. 11).

Discussion
In this study, we performed an integrated analysis on the gene expression profiles from lung tissues with or without COPD, aiming to identify the DEGs and related key signaling pathways for the disease. We identified 139 DEGs, including 62 up-regulated genes and 77 down-regulated genes. In addition, GO analysis showed that the 139 DEGs associated with COPD were involved in 60 BPs, 16 MFs and 12 CCs. Among these categories, the most important BPs included inflammatory response, immune response and response to lipopolysaccharide; the most important MFs included receptor activity and RAGE receptor binding; and the most important CCs included extracellular region and space, integral component of plasma membrane, plasma membrane and external side of plasma   membrane. This finding accords with the knowledge that COPD is characterized by chronic inflammation in the lung and airways [28,29]; immune response mediates the development of COPD caused by the harmful stimuli [30][31][32]; lipopolysaccharide may lead to increased airway and systemic inflammation, and contribute to the progressive deterioration of lung function [33,34]; and RAGE is a 'driving force' for cigarette smoke (CS)-induced airway inflammation in COPD [35]. KEGG pathway analysis indicated that 48 pathways corresponded to these DEGs associated with COPD. Two pathways including hematopoietic cell lineage and cytokine-cytokine receptor interaction were most important. This finding is in line with the results from previous studies [18,32].
The PPI network of proteins encoded by DEGs identified 14 Hub DEGs associated with COPD, including CX3CR1, PPBP, PTGS2, FPR1, FPR2, VCAM1, S100A12, ARG1, EGR1, CD163, FGG, ORM1, S100A8 and S100A9. All of these Hub genes were involved in the most important two BPs, two MFs or five CCs revealed by GO analysis, and were mainly implicated in multiple pathways identified by KEGG analysis in this study. Those results indicate that these Hub DEGs are involved in the development and progression of COPD by playing important biological roles in multiple signaling pathways.
Using WGCNA on the merged expression profile from two cohorts of lung tissues with COPD and healthy controls, we identified a set of gene signatures based on the 14 Hub genes. The increased expression of CX3CR1, FGG, EGR1, VCAM1 and PTGS2 is positively associated with COPD, and the underexpression of PPBP, FPR1, FPR2, S100A12, ARG1, CD163, ORM1, S100A8 and S100A9 is negatively associated with COPD.
CX3CR1 plays an important role in the development of chronic inflammatory lung diseases, such as COPD and emphysema, by contributing to structural destruction and remodeling. Chemoattractant inflammatory cells releasing CX3CR1, such as CD8 À /CD4, dendritic cells, cd T lymphocytes, natural killer cells and monocytes/macrophages, may lead to mononuclear cell   9. Network construction identifies distinct modules of coexpressed genes. The network was constructed using the lung tissue expression dataset of GSE27597 and GSE106986. The cluster dendrogram was produced by average linkage hierarchical clustering of genes using 1 À topological overlap as dissimilarity measure. Modules (Dynamic Tree Cut) and similarly merged modules (Merged dynamic) of coexpressed genes were assigned colors corresponding to the branches indicated by the horizontal bar beneath the dendrogram (merged cut height = 0.25). accumulation in the parenchyma and lung vessel walls, release mediators to induce injury, stimulate proliferation and chemoattractant inflammatory cells [36]. In addition, CX3CR1 + mononuclear phagocytes may induce an innate immune response to CS via producing interleukin-6 and tumor necrosis factor-a, and contribute to emphysema [37].
PTGS2 (COX-2), an important mediator of inflammation, was shown to be involved in inflammation response and associated with COPD pathogenesis [38][39][40][41]. The decreased activity of PTGS2 may protect smokers against the development of COPD [40]. Furthermore, FPR1 and FPR2 were reported to be involved in recruitment and activation of inflammatory Fig. 10. WGCNA heatmap. Using the default parameter setting and all DEGs (n = 2942), we identified nine gene modules using WGCNA that were positively or negatively associated with COPD and smoking trait. Each row corresponds to a module eigengene and each column to a clinical trait (COPD and smoking status). Positive associations are red, and negative associations are green. HCS, history of smoking. cells induced by CS, and play important roles in lung inflammation, injury and the pathogenesis of COPD [42][43][44][45]. S100A8, S100A9, and S100A12 might induce neutrophil and monocyte chemotaxis, adhesion to fibrinogen and diapedesis, and neutrophil migration to inflammatory sites [46,47], and have been identified as key biomarkers in inflammatory diseases including COPD and neutrophil-dominated infections [35,48,49]. The mRNA levels of S100A8, S100A9 and S100A12 may be regulated by RAGE, which was shown to contribute to CS-induced airway inflammation in COPD [35]. This is consistent with our result in this study that the RAGE pathway including S100A8, S100A9 and S100A12 is important in the development of COPD. EGR1 is an autophagy regulator gene that plays important roles in cellular homeostasis, airway remodeling and control of inflammatory immune response; it is also a significant risk factor for susceptibility to COPD [50][51][52]. EGR1 may be induced by CS and involved in proinflammatory mechanisms that are likely associated with the development of COPD [51], whereas Egr-1 À/À mice were observed to resist CS-induced autophagy, apoptosis and emphysema [53]. These findings exhibit a critical role for EGR1 in CSinduced inflammatory immune response and COPD, and effective inhibition of EGR1 may attenuate airway remodeling and inflammation associated with the pathology of COPD.
CD163, a carefully regulated component of the innate immune response to infection and a macrophage receptor for bacteria, was shown to play important roles in functional pulmonary defense elements and the inflammatory immune response of the respiratory system [54,55]. Overexpression of CD163 on lung alveolar macrophages may be implicated in the pathogenesis of COPD and poor lung function [56].
ARG1 was shown to contribute to asthma pathogenesis by inhibiting nitric oxide production, modulating fibrosis, regulating arginine metabolism and inhibiting T cell proliferation, and it involves the initiation of adaptive T helper 2 cell-mediated allergic lung inflammation by regulating group 2 innate lymphoid cells [57][58][59][60], whereas ARG1 ablation in the lung may enhance peripheral lung function but have little effect on airway inflammation [61]. The role of ARG1 in COPD needs to be studied in the future.
ORM1 appears to function in regulating the activity of the immune system during the acute-phase reaction and has been identified as an acute exacerbation of COPD-specific immunomodulatory mediator [62]. PPBP serves as a potent neutrophil chemoattractant and activator, and its elevated expression in the bronchial mucosa might be involved in the pathogenesis of COPD [63,64]. In addition, VCAM1 was shown to express in endothelial cells of atopic asthma cases, but not COPD cases [65], and present an association with lung function [66]. FGG was found to be involved in blast lung injury resistance via promoting tissue-protective adenosine signaling [67]. In the lung tissues of COPD cases, its mRNA expression was reported to correlate with the burden of particulate matter in total lung and lung parenchyma [68].
In conclusion, we have identified 139 candidate DEGs associated with the progression of COPD. The results from bioinformatic analysis are in agreement with those from previous cell and animal models and human studies. Our results showed that nine Hub genes, CX3CR1, PTGS2, FPR1, FPR2, S100A12, EGR1, CD163, S100A8 and S100A9, potentially mediated inflammation and injury of the lung, and play critical roles in the pathogenesis of COPD. The roles of five Hub genes, including PPBP, ARG1, FGG, VCAM1 and ORM1, identified to be associated with COPD in this study need to be confirmed in the future. These findings could improve our understanding of the underlying molecular mechanisms of COPD and provide us with insights for drug target discovery for the disease.