Integrative analysis of hub genes and key pathway in two subtypes of diffuse large B‐cell lymphoma by bioinformatics and basic experiments

Abstract Background The germinal center B‐cell (GCB) and activated B‐cell (ABC) subtypes of diffuse large B‐cell lymphoma (DLBCL) have a significant difference in prognosis. This study aimed to identify potential hub genes, and key pathways involved in them. Methods Databases including Gene Expression Omnibus (GEO), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and STRING were accessed to obtain potential crucial genes and key pathways associated with the GCB and ABC. Then qRT‐PCR and Western blot experiments were performed to verify the most clinically significant gene and pathway. Results Three cohort datasets from the GEO database were analyzed, including 195 GCB and 169 ABC samples. We identified 1113 differentially expressed genes (DEGs) between the GCB and ABC subtypes. The DEGs were mainly enriched in biological processes (BP). The KEGG analysis showed enrichment in cell cycle and Wnt signaling pathways. We selected the top 10 genes using the STRING database and Cytoscape software. We used 5 calculation methods of the cytoHubba plugin, and found 3 central genes (IL‐10, CD44, CCND2). CCND2 was significantly related to the prognosis of DLBCL patients. Besides, our experimental results demonstrated a significantly higher expression of CCND2 in the ABC‐type cell line than in the GCB‐type; it was proportional to the expression of key proteins in the Wnt signaling pathway. Conclusion CCND2 overexpression and Wnt pathway activation might be the main reasons for the poor prognosis of ABC‐DLBCL.


| INTRODUC TI ON
Diffuse large B-cell lymphoma (DLBCL) accounts for about 30% of all lymphoma cases; it is the most common malignant tumor of the lymphatic system. 1 According to the cell of origin classification, DLBCL has two major biologically distinct molecular subtypes, namely the GCB (germinal center B-cell) and ABC (activated B-cell). 2 ABC-DLBCL has worse clinical outcomes when treated with standard chemoimmunotherapy. 3,4 However, the biological mechanisms are still controversial. The molecular classification not only predicts prognosis but also relates to the personalized therapy of DLBCL patients, so novel therapeutic targets and effective treatment options need to be ascertained.
In previous studies, some focused on genomic damages and treatment resistance. Some indicated that ABC-DLBCL seems to freeze in the late state, showing apparent plasma cell differentiation arrest. 5,6 The results may have limitations due to tissue or sample heterogeneity. Since the prognosis of patients with two subtypes of one disease is unexpectedly different, is there any relationship with cell cycle or tumor stemness between them? With the rapid development of gene sequencing technologies, we could explore it via combining bioinformatics methods with expression profiling even basic experiments. 7 Regardless, it undoubtedly provides convenience for our scientific research. Here, we selected three microarray datasets to facilitate our analyses. After selecting intersecting genes to improve accuracy, we performed gene function and pathway enrichment analysis. A series of analytical tools and software were applied to obtain information such as hub genes, signal pathways, and survival periods. Most related studies focused on one cohort, but sequencing results are often limited and inconsistent due to samples' heterogeneity in independent studies. 6,8 Therefore, it is necessary to apply integrated bioinformatics methods on quality and merged expression profiles to increase statistical power in detecting more reliable genes. More importantly, the above results need to be verified further by experiments in vitro. However, no such study has been reported so far. Our analysis included this comparison, and it made us understand the mechanism better and reveal the difference in molecular expression between the two subtypes.
Due to the different pathogenesis and prognosis of survival, it is urgently vital to clarify the etiology and molecular mechanisms of these DLBCL subtypes and to determine molecular biomarkers for diagnosis and individualized treatment.

| Cell culture
In this study, we used two human diffuse large B-cell lymphoma cell lines, including one GCB subtype (SU-DHL-6) and one ABC subtype (SU-DHL-2) cell line. They were purchased from Saiku Company.

| Data source
The datasets we analyzed came from the GEO database (Gene Expression Omnibus) (http://www.ncbi.nlm.nih.gov/geo/), which were chosen after searching for keywords related to GCB and ABC subtype of DLBCL on the same platform (GPL570, [HG-U133 Plus 2] Affymetrix Human Genome U133 Plus 2.0 Array). We retrieved 232 DLBCL molecular subtype series based on the cell of origin algorithm from the GEO database. Three independent datasets (GSE19246, 9 GSE87371, 10 and GSE56313 11 ) were selected for the research. From these profiles, we retrieved information about 364 individual DLBCL samples.

| Data processing of DEGs
The GEO2R online analysis tool of the National Institute of Biomedical Research (https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to find the differentially expressed genes (DEGs) between the ABC and GCB subtypes. The genes that met the cut-off criteria (p < 0.05 and |logFC(fold change)| ≥ 1.0 after adjustment) were considered DEGs. Statistical analysis was performed on all datasets. The online VENN diagram tool was used to identify intersecting genes (Bioin forma tics.psb.ugent.be/webto ols/venn/).

| Functional enrichment analyses of DEGs
The GO (Gene Ontology) annotation and the KEGG (Kyoto

| PPI network integration and Hub gene identification
The DEGs data were uploaded to the Search Tool for the Retrieval of Interacting genes (STRING) database 14 to construct a network based on protein-protein interaction (PPI). Confidence score >0.4 was set, and the resulting output was used in the Cytoscape software (www.cytos cape.org/) to visualize the network and select hub proteins. 15 We utilized five calculation methods (Degree, EPC, Eccentricity, MCC, and MNC) of the CytoHubba application to select the top 10 genes. The central (intersecting) genes were considered core candidate genes, as nodes with a higher degree of connection are more significant and likely to influence biological function. Then, the top 10 genes from each method were uploaded to the online website (http://www.Ehbio.com/Image GP/index.php/Home/Index/ VennD iagram.html) to obtain the intersecting genes, representing key candidate genes with essential biological regulatory functions.

| Survival analysis
We performed survival analysis on the intersecting genes using the Survival and survminer R software packages (version 3.6.3; http:// bioco nductor). The normalized GSE87371 expression dataset was used for the survival analysis; the intersecting genes were divided into two subgroups based on their expression levels. Statistically significant OS and PFS were visualized with the R. Log-rank p < 0.05 was considered significant.

| Quantitative real-time polymerase chain reaction
We isolated total RNA from each cell line using TRIzol reagent  Table 1.

| Western blot analysis
Cells (2 × 10 6 ) were lysed in RIPA lysis buffer (Beyotime Biotechnology) and quantitated using a BCA Protein Assay Kit (Beyotime Biotechnology). The protein liquid was mixed with 5 × loading buffer in a volume ratio of 4 to 1, and placed in a boiling water bath for 10 min to denature. For Western blot analysis, equivalent amounts of protein per sample were electrophoretically resolved on 10% polyacrylamide gels and then transferred onto PVDF membranes (Millipore). Following this, the PVDF membranes with the protein were blocked with blocking buffer (Beyotime Biotechnology), then incubated with the corresponding primary antibodies. After repeated washes, the membranes were incubated with horseradish-peroxidase-conjugated anti-mouse or anti-rabbit secondary antibody (Cell Signaling, 1:1000 diluted) at room temperature for 1 h. An electro-chemiluminescence (ECL) system (Thermo Fisher Scientific) was used for the detection. Antiβ-actin (Epitomics) was used to check for equal loading of protein between wells. The Primary antibodies used for the Western blot are shown in Table 2.

| Statistical analysis
Statistical analyzes and graphing were performed with SPSS v19.0 (SPSS Inc.) and R statistical software, respectively. We utilized the log-rank test to compare OS and PFS among patients in different groups. Student's t test were used to compare two groups of celllevel experiments. Data were reported as mean ± SD (standard deviation). And p-value <0.05 was regarded as statistically significant.

| Identification of DEGs
As shown in Table 3 Subsequently, we performed a VENN analysis and got the intersecting DEGs. In total, 120 DEGs were obtained after the intersection of all three groups ( Figure 1A), of which 58 were upregulated genes ( Figure 1B) and 62 were downregulated genes ( Figure 1C).

| Enrichment analyses of DEGs
We evaluated the functions and pathway enrichment of the candidate DEGs at the DAVID website. As shown in Figure 2, in GO analysis, the DEGs were mainly enriched in BP terms, including organelle fission, mitotic nuclear division, and so on (Figure 2A). In the MF group, receptor ligand activity and protein binding were the major enrichments ( Figure 2B). Chromosome and mitotic spindle were mainly enriched in the CC terms ( Figure 2C). The KEGG analysis showed enrichments mostly in cell cycle and Wnt signaling pathways ( Figure 2D).

| PPI network integration and Hub gene identification
To better understand the interactions among the intersecting DEGs obtained from the three datasets, the STRING database was used to generate a PPI network consisting of 114 nodes and 62 edges, with parameters set to interaction score >0.4 and query proteins only being revealed ( Figure 3A). The Cytoscape software was used to analyze hub proteins after importing the data, and the top 10 genes were evaluated with five calculation methods in the CytoHubba application (Table 4).
Furthermore, we uploaded the genes from the five calculation methods to the Venn Diagram online website to generate intersecting genes to identify significant hub genes ( Figure 3B). We obtained 3 intersected hub genes (CCND2, CD44, and IL-10), indicating the common DEGs. As shown in Table 5, the above 3 hub gene expressions in each dataset were upregulated in the ABC subtype of DLBCL.

| CCND2 may be a key prognostic gene in DLBCL
To

| Experimental validation
In this study, 2 cell lines of DLBCL (GCB subtype: SU-DHL-6, and ABC subtype: SU-DHL-2) were used to experimentally validate the biological information on the discovered hub genes. Our qRT-PCR results indicated CCND2 was significantly upregulated in the ABC subtype ( Figure 4A). To further explore the mechanism, we carried out the Western blot experiment. According to the enrichment results of the KEGG pathway, we selected the Wnt signal pathway. We found that the expression level of CCND2 positively correlated with the expression level of star protein molecules in the Wnt signaling pathway ( Figure 4B).

| DISCUSS ION
Diffuse large B-cell lymphoma incidence is high and carries a significant disease burden. Within the DLBCL patients, two major DLBCL subtypes (GCB and ABC) have been defined by gene expression profiling. 16 Patients with ABC-DLBCL demonstrated worse outcomes than those with GCB-DLBCL, with 5-year OS rates ranging from 30% to 56% and 60% to 78%, respectively. 17,18 To our knowledge, this is the first report to analyze hub genes and key pathways using bioin-  to obtain potential candidate hub genes and key pathways more reliably. Not only that, we accessed many online publicly available databases like GO, KEGG, string database, and Cytoscape software to perform various analyzes. 10 hub genes were selected after using five calculation methods of the cytoHubba's application. Each algorithm included 10 unique hub genes. As a result, we obtained 3 common hub genes (IL-10, CD44, CCND2) and performed a survival curve analysis on them. Furthermore, we found that high expression of CCND2 had a statistically significant effect on the prognosis of DLBCL patients, including OS and PFS. Our results were partly consistent with previous researches. 6,19 However, due to the difference in the source of the datasets, results from data on the same platform could still vary. 20 Interestingly, the DEGs were mainly associated with "cell cycle" and "Wnt signaling pathway" in KEGG analysis. To verify the above results, our in-depth experimental study focused on using and it associates with the extracellular matrix changes that influence cell growth, survival, and differentiation. 29 Higher CD44 expression has been noticed in ABC-DLBCL than GCB-DLBCL, but contradictory data have also been reported. 30  understanding of the potential causes of differences in the clinical prognosis of DLBCL. Moreover, it remained exciting to the assessment of the prognosis of patients with changes in these genes.
However, this study also had shortcomings. One limitation was that we only used a single data source (GEO), other source like TCGA (The Cancer Genome Atlas) may also be need to confirm our results. In addition, our research focused just on the alterations of downstream protein levels of the disease mechanism. The upstream mechanism of CCND2 and Wnt pathway are also worthy of further exploration, such as different genetic and epigenetic changes.
Fortunately, our findings may provide a direction for future in-depth

research.
Above all, we identified 120 DEGs between DLBCL GCB and ABC subtypes. Three common hub genes were found after applying five calculation methods of the CytoHubba's application, including IL-10, CD44, and CCND2. The high expression of CCND2 had a statistically significant effect on the prognosis of DLBCL patients.
The KEGG pathway analysis associated most of the DEGs with "cell cycle" and "Wnt signaling pathway". Experiments showed that the RNA and protein expression levels in CCND2 and Wnt pathway were different in ABC and GCB cell lines. These results might provide promising therapeutic targets or novel prognostic biomarkers of ABC and GCB subtypes of DLBCL.

ACK N OWLED G M ENTS
This study was supported by National Natural Science Foundation of China (81670179) and the Major Subject of Science and Technology of Anhui Province (201903a07020030).

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest regarding the publication of this paper. The data generated or analyzed during this study are available from the corresponding author upon reasonable request.