Systematic profiling identifies PDLIM2 as a novel prognostic predictor for oesophageal squamous cell carcinoma (ESCC)

Abstract Till now, no appropriate biomarkers for high‐risk population screening and prognosis prediction have been identified for patients with oesophageal squamous cell carcinoma (ESCC). In this study, by the combined use of data from the Gene Expression Omnibus (GEO) datasets and The Cancer Genome Atlas (TCGA)‐oesophageal carcinoma (ESCA), we aimed to screen dysregulated genes with prognostic value in ESCC and the genetic and epigenetic alterations underlying the dysregulation. About 222 genes that had at least fourfold change in ESCC compared with adjacent normal tissues were identified using the microarray data in GDS3838. Among these genes, only PDLIM2 was associated with nodal invasion and overall survival (OS) at the same time. The high PDLIM2 expression group had significantly longer OS and its expression was independently associated with better OS (HR: 0.64, 95% CI: 0.43‐0.95, P = 0.03), after adjustment for gender and pathologic stages. The expression of its exon 7/8/9/10 had the highest AUC value (0.724) and better prognostic value (HR: 0.43, 95% CI: 0.22‐0.83, P = 0.01) than total PDLIM2 expression. PDLIM2 DNA copy deletion was common in ESCC and was associated with decreased gene expression. The methylation status of two CpG sites (cg23696886 and cg20449614) in the proximal promoter region of PDLIM2 showed a moderate negative correlation with the gene expression in PDLIM2 copy neutral/amplification group. In conclusion, we infer that PDLIM2 expression might be a novel prognostic indicator for ESCC patients. Its exon 7/8/9/10 expression had the best prognostic value. Its down‐regulation might be associated with gene‐level copy deletion and promoter hypermethylation.

and the genetic and epigenetic alterations underlying the dysregulation. About 222 genes that had at least fourfold change in ESCC compared with adjacent normal tissues were identified using the microarray data in GDS3838. Among these genes, only PDLIM2 was associated with nodal invasion and overall survival (OS) at the same time. The high PDLIM2 expression group had significantly longer OS and its expression was independently associated with better OS (HR: 0.64, 95% CI: 0.43-0.95, P = 0.03), after adjustment for gender and pathologic stages. The expression of its exon 7/8/9/10 had the highest AUC value (0.724) and better prognostic value (HR: 0.43, 95% CI: 0.22-0.83, P = 0.01) than total PDLIM2 expression. PDLIM2 DNA copy deletion was common in ESCC and was associated with decreased gene expression.
The methylation status of two CpG sites (cg23696886 and cg20449614) in the proximal promoter region of PDLIM2 showed a moderate negative correlation with the gene expression in PDLIM2 copy neutral/amplification group. In conclusion, we infer that PDLIM2 expression might be a novel prognostic indicator for ESCC patients. Its exon 7/8/9/10 expression had the best prognostic value. Its down-regulation might be associated with gene-level copy deletion and promoter hypermethylation.

| INTRODUC TI ON
Oesophageal squamous cell carcinoma (ESCC) is the dominant histological subtype of oesophageal carcinoma (ESCA) and is a highly aggressive malignancy because of the strong potential of invasion and metastasis. 1 Therefore, the overall 5-year survival rate after surgery and chemotherapy is still low, ranging from 15% to 25%. 1,2 This disease has remarkable geographic distribution, showing particularly high incidence and mortality rates in China and some other Asian countries. 2 In the past decades, a series of studies have been performed to understand the molecular mechanisms underlying the pathological development of ESCC. [3][4][5] However, no appropriate biomarkers for high-risk population screening and prognosis prediction have been identified. TNM stage system is still the most reliable tool to stratify patients for making treatment plan and to predict prognosis. In clinical practice, ESCC patients with the same TNM stage and receiving the same therapy might have significantly different survival outcome. [6][7][8] To improve prognosis, it is imperative to look for reliable biomarkers for identifying high-risk patients who require intensive follow-up and therapeutic support.
Over the last decade, The Cancer Genome Atlas (TCGA) project provided comprehensive clinical, genetic, epigenetic and pathological data to illuminate the landscapes of 33 types of primary tumours based on cancerous and normal tissues from over 10,000 patients. 9,10 TCGA-ESCA is a part of the TCGA project that contains data from around 200 ESCA (100 adenocarcinomas and 100 ESCC respectively) patients, which enables us to explore the association between cancer phenotypes and genotypes. The long-term survival data also provided a reliable platform to explore prognostic biomarkers. 11 However, the number of normal samples in this dataset is relatively small (N = 10 for adenocarcinoma and N = 3 for ESCC), which makes it difficult to analyse the dysregulated genes (DEGs).
The Gene Expression Omnibus (GEO) DataSets (https ://www.ncbi. nlm.nih.gov/geo/info/datas ets.html) store a large number of gene expression data from original submitter-supplied series, samples and platforms, thus providing an optimal data reservoir to explore gene expression profiles.
In this study, by the combined use of data from GEO datasets and TCGA-ESCA, we aimed to screen the significantly dysregulated genes and to explore their prognostic value in ESCC. In addition, the potential genetic and epigenetic alterations associated with the candidate prognostic markers were also investigated.

| Data mining in GEO datasets
The normalized data of one previous Affymetrix HG-U133A 2.0 gene expression array that compared gene expression in 17 micro-dissected ESCC tumours and matched adjacent normal tissue pairs were downloaded from: (https ://www.ncbi.nlm.nih.gov/ sites/ GDSbr owser ?acc=GDS3838). 12 The data were loaded into NetworkAnalyst for re-analysis, which is a visual analytics platform for comprehensive gene expression profiling. 13 Only the genes had log2 fold change ≥2 and adjusted P < 0.05 were identified as the candidate DEGs.

| Data mining in TCGA-ESCA
The level-3 data in TCGA-ESCA were acquired with the use of the UCSC Xena browser (https ://xenab rowser.net/). 14 Only the ESCC subset (TCGA-ESCC) that with RNA-seq data of gene expression was extracted and used in this study. None of the patients in this subset received neoadjuvant treatment. After screening with these criteria, 95 cases of primary ESCC were identified. The clinicopathological data, including age at initial diagnosis, gender, histological grade, the history of oesophageal cancer, pathological nodal (pN) status, pathological stages, reflux history, smoking history, post-operative drug therapy, radiation therapy, alcohol consumption per week (defined as frequency of alcohol consumption/week * amount of alcohol consumption/day) and OS survival data were downloaded for re-analysis.
The genomic data of the ESCC cases, including the RNA-seq data of total gene expression, the RNA-seq data of exon expression of a specific gene, the methylation status, DNA copy number alterations (CNAs) and somatic mutations of targeting genes were also extracted. Briefly, the methylation status of target genes was exam-

| In silico analysis of the alternative transcription of PDLIM2
The alternative transcripts of PDLIM2 in oesophageal cancer were analysed using ISOexpresso (http://wiki.tgilab.org/ISOex press o/), which is a web-based platform for isoform-level expression analysis among samples in TCGA. 16 The genes were matched with HGNC gene, Ensembl release 74 and UCSC refGene, while the isoform information of the genes is collected from UniProt ID mapping data and UCSC knownGene tables 16. The relative expression level of each transcript was quantified using transcripts per kilobase million.

| Statistical analysis
Data integration and analysis were conducted using spss 25.0 software package (SPSS Inc, Chicago, IL), together with GraphPad Prism 7.04 (GraphPad Inc, La Jolla, CA). The difference in gene expression between groups was compared using unpair t test with Welch's correction. Receiver operating characteristic (ROC) curve was generated and area under the curve (AUC) was calculated to assess the predictive value of PDLIM2 and its exon expression for OS. Log-rank test was performed to examine the significance of the difference between the Kaplan-Meier OS curves. Stepwise regression was conducted to compare the predictive value of total PDLIM2 expression and its exon 7/8/9/10 expression in terms of OS. The prognostic value of PDLIM2 and its exon expression were assessed using the univariate and multivariate Cox regression models. Linear regression analysis was performed and the Pearson's correlation coefficient was calculated to assess the correlation between total PDLIM2 expression and the expression of its exons or the methylation status of the CpG sites in its DNA locus. P < 0.05 was considered statistically significant.

| Screening of significantly dysregulated genes in ESCC
Using the microarray data from GDS3838, we screened the significant DEGs between the 17 cases of ESCC tumour tissues and matched normal tissues, with the following criteria: log2 fold change ≥2 and adjusted P < 0.05 ( Figure 1A). Results showed that there were 59 up-regulated and 163 down-regulated genes in ESCC compared to adjacent normal tissues ( Figure 1B,C). The details of the 222 DEGs were provided in Table S1.

| Screening of OS-related DEGs in ESCC
In the ESCC subset of TCGA-ESCA, we found that 220 of the 222 DEGs had RNA-seq data of gene expression. Lymph nodal invasion is a critical hallmark of the progression of ESCC and direct increases the risk of unfavourable survival. 17  We also generated ROC curves and calculated the AUCs of the 18 OS-related genes in Figure 2B, by setting living as the status variable. The AUCs of PDLIM2 (0.684) ranked third among the genes, while ATAD2 and COL10A1 showed higher AUCs (0.692 and 0.691 respectively) ( Figure S1). However, we then checked their expression profile and found that these two genes were significantly upregulated in ESCC compared with adjacent normal tissues (Table S1) and thus were abandoned for analysis.
The clinicopathological and survival data extracted from TCGA-ESCC were provided in Table S2. Ninety-five ESCC patients were divided into two groups according to median PDLIM2 expression. Their clinicopathological parameters were compared in Table 1. Results showed that these two groups had similar clinicopathological profiles, expect the ratio of survival. The high PDLIM2 expression had a significantly lower ratio of death (9/48 vs 23/47, P < 0.01) ( Table 1).
By generating Kaplan-Meier survival curves according to the median expression of PDLIM2, we found that the high PDLIM2 expression group had significantly longer OS (P = 0.01) ( Figure 3A). Univariate analysis showed that male patients, advanced pathological stages and decreased PDLIM2 expression were the risk factors of unfavourable survival ( Table 2). The following multivariate analysis confirmed that total PDLIM2 expression was independently associated with better OS in ESCC patients (HR: 0.64, 95% CI: 0.43-0.95, P = 0.03) (

| In silico analysis of the potential genetic and epigenetic alterations associated with PDLIM2 dysregulation
In  Figure 4A). Then, we examined the correlation between PDLIM2 expression and its DNA CNAs. PDLIM2 amplification did not necessarily result in up-regulated transcription, compared to copy neutral cases (P = 0.82, Figure 4B). In comparison, its DNA deletion was associated with significantly lower gene expression, compared with the copy neutral cases (P < 0.01, Figure 4B). Kaplan-Meier survival curves showed that the group with PDLIM2 DNA deletion had significantly inferior OS, compared to the amplification/copy neutral group (P < 0.01, Figure 4C). We also checked the mutation status of PDLIM2 in 94 ESCC cases with mutation data available ( Figure S2). Among 94 ESCC patients, two had missense mutations ( Figure S2A). However, no survival difference was observed between the mutation and no mutation group ( Figure S2B).
We then checked methylation 450 k data of 95 ESCC cases in TCGA. The methylation status of 19 CpG sites in PDLIM2 DNA locus was measured ( Figure 5A). The correlation between the methylation level of PDLIM2 CpG sites and total PDLIM2 expression was calculated ( Figure 5B). Results showed that the methylation level of two CpG site (cg23696886 and cg20449614) had a moderately negative correlation with PDLIM2 expression (Pearson's r = −0.47 and r = −0.46 respectively; Figure 5B). cg23696886 is located at the intron before exon 2, while cg20449614 is in exon 2 (red-dotted frame, Figure 5A), where the PDLIM2 promoter maps to Ref. 19 .
Then, we analysed the correlation between PDLIM2 CNAs and the methylation of the two CpG sites. Results showed that the copy loss group (−1/−2) had significantly higher methylation at these two CpG sites (P = 0.02 and P = < 0.01 respectively, Figure 5C,D). To reduce the potential mixing effect of CNAs and methylation on gene expression, we analysed the correlation between PDLIM2 expression and the methylation of the two CpG sites in copy neutral/amplification group and copy loss group respectively ( Figure 5E,F). Results showed that, in copy neutral/ amplification group, there were still moderately negative correlations ( Figure 5E). However, the correlation coefficients were much smaller in the DNA −1/−2 groups ( Figure 5F). These findings imply that DNA hypermethylation might be a potential mechanism leading to PDLIM2 down-regulation in copy neutral/amplification group. In comparison, PDLIM2 copy loss might be the dominant cause of reduced gene expression in the copy loss group. Hodgkin and anaplastic large cell lymphoma. 29 It acts as an essential terminator of NF-κB activation in both colorectal cancer and breast cancer cells. 26,27 Its expression results in retarded anchorage-independent growth in vitro and decreased tumourigenicities of the cancer cells in vivo. 26 Deletion of this domain results in increased accumulation of PDLIM2 in the cytoplasm and nucleoplasm, and reduced protein level in the nuclear matrix. 33 These findings suggest that the expression of exon  37,38 This triggered our interest to explore the association between the CNA status of PDLIM2 and its expression in ESCC. Using data in TCGA-ESCC, we found that PDLIM2 gene-level deletion is frequent and is associated with decreased gene expression, suggesting that DNA copy loss might be a mechanism of its down-regulation. Some previous studies reported that PDLIM2 expression was also regulated by the promoter methylation that blocks its transcription in colon cancer, 26 ovarian cancer 28 and classical Hodgkin and anaplastic large cell lymphoma. 29 However, whether its expression is related to the methylation status of its promoter in ESCC is still not clear.
In this study, we found the methylation status of two CpG sites (cg23696886 and cg20449614) in the proximal promoter region of PDLIM2 showed a moderately negative correlation with the gene expression, especially in PDLIM2 copy neutral/amplification group. Therefore, we infer that promoter hypermethylation might also be an important epigenetic alteration leading to suppressed PDLIM2 expression in ESCC.
This study also has some limitations. This is an in silico analysis based on a relatively small sample size. Additional validation with a bigger dataset could be considered in the future. Besides, the functional role of PDLIM2 in ESCC was not explored in the current study. Molecular studies are required to explore the potential suppressive effect of PDLIM2 on the malignant phenotype of ESCC and the underlying mechanisms.

| CON CLUS ION
In conclusion, by the systemic screening in this study, we found that PDLIM2 is one of the most significant down-regulated genes in ESCC and might be a novel prognostic predictor in terms of OS. The expression of PDLIM2 exon 7/8/9/10 had the best prognostic value.
PDLIM2 down-regulation is associated with gene-level copy deletion and promoter hypermethylation.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data used in this study were included in the manuscript and supplementary materials.