Aberrant CpG‐methylation affects genes expression predicting survival in lung adenocarcinoma

Abstract Lung adenocarcinoma (LUAD) is a common diagnosed disease with high‐mortality rate, and its prognostic implications are under discovered. DNA methylation aberrations are not only an important event for dysregulation of gene expression during tumorigenesis but also a revolution in epigenetics by identifying key prognostic biomarkers for multiple cancers. In this study, we analyzed methylation status of 485 578 CpG sites and RNA‐seq transcriptomes of 20 532 genes for 1095 LUAD samples in TCGA database. The association between DNA methylation and the prognostic value of the corresponding gene expression was identified as well. In total, ten aberrantly methylated and dysregulated genes (AURKA, BLK, CNTN2, HMGA1, PTTG1, TNS4, DAPK2, MFSD2A, THSD1, and WNT7A) were highlighted which were significantly correlated with overall survival of 492 LUAD patients, which were all reported as tumor‐associated genes in other various cancers and worthy of further investigated and might be used as therapeutic targets for LUAD. Together, methylation aberrances regulate gene expression level during tumorigenesis and influence prognosis of LUAD patients. Integrating knowledge of epigenetics and expression of genes can be useful for an in‐depth understanding of cancer mechanism and for the eventual purpose of precisely prognostic and therapeutic target verification.


| INTRODUCTION
Lung cancer (LC) is a common diagnosed disease in both women and men around the world, which has high-mortality rate and is responsible for approximately 1.5 million deaths every year. 1 According to the histological heterogeneity, LC is divided into two primary subtypes, non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC), representing for 85% and 15% of all LC, respectively. 2 NSCLC can be further subdivided into lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and lung large cell carcinomas (LULC). Among them, LUAD has increased up | 5717 HE Et al. to 50% and becomes the biggest subgroup of LC since early 2010s. 3 Up to date, the overall 5-year survival rate of LUAD patients is approximately 20%; however, it rises to 55% in the cases diagnosed with localized lung cancer. With the rapidly increasing morbidity and severe metastasis-associated mortality, it is crucial to clarify the molecular mechanisms and oncogenomic aberrations, which characterize the occurrence and metastatic process of LUAD.
It is well-established that genetic dysregulation is the key part of cancer etiology. 4 In addition, new emerging evidences have demonstrated the combined effect of both genetic and concomitant epigenetic change must be considered during oncogenesis. 5 Oncogenomic aberrations are no longer to be taken not only as a genetic disorder exclusively, but also as epigenetic alterations. DNA methylation, an important form of epigenetic modification, has significant functions on gene expression, genomic stability, and modification. 6 Hypermethylation or hypomethylation of DNA was observed in variety of tumors but not in various normal tissues, 7 which indicated that methylation aberration might be treated as a hallmark of a wide variety of cancers.
The advent of deep RNA-Seq approach and wide DNA methylation arrays has significantly contributed to explore the interactive relationship between gene expression and methylation during tissue development and carcinogenesis. Xie et al 8 investigated the significant importance of DNA methylation on modulating gene expression monitored by RNA-Seq analysis during human heart, kidney and liver development. An integrative analysis of DNA methylation and mRNA expression performed by Kim et al 9 pointed out the key function of epigenetic alteration on human malignant mesothelioma cell heterogeneity. However, the methylation state of LUAD-specific-associated genes is still under investigation. In the present study, we analyzed large-scale DNA methylation level and RNA-seq transcriptomes of LUAD samples from 1095 cases in TCGA database. Together with the survival analysis of 492 LUAD patients, 10 potential diagnostic and prognostic biomarkers of LUAD were pointed out which were worthy of further investigated and might be used as therapeutic targets for LUAD.

| Patients and data processing
In this study, 1095 LUAD cases in total were downloaded from TCGA data portal (https://cancergenome.nih.gov/, level 3, normalized gene expression data [RSEM] and HumanMethylation450 data) accessed on 20171206. Of them, 636 LUAD patients had whole genomic DNA methylation data of 485 578 CpG sites, which was profiled by using F I G U R E 1 Flowchart representing the design of study. First, the methylation status of CpG sites and the mRNA expression level of transcripts in 18 paired LUAD and adjacent non-LUAD tissues from 1095 cases of LUAD were compared. Then, 32 candidate genes were narrowed down by intersection of aberrant methylated genes and abnormal expressed transcripts, as well as tumor-associated genes. Finally, together with survival rate analysis of 492 LUAD patients, we found out 10 prognostic-related genes which were differentially expressed in LUAD tissues due to their aberrant methylation of CpG sites 5718 | HE Et al.
Illumina Infinium HumanMethylation450 BeadChips assay. And 576 LUAD sufferers had transcriptomic data of 20 532 genes, which was analyzed using Illumina HiSeq_RNASeq V2 platform. A total of 492 of these 576 LUAD suffers had recorded clinical annotation data and involved in further Kaplan-Meier survival analysis. To precisely discover the differential methylation CpG sites and transcripts, we only selected the cases which had data for both LUAD tumor and adjacent non-LUAD normal tissues. Finally, 18-paired cases were co-existed in 29 coupled (tumor and adjacent tissue) methylation data and 57 coupled RNA-seq data, which was used in the followed methylation and expression analysis ( Figure 1).

| DNA methylation analysis
We computed the difference at the probe level between the tumor and normal groups in LUAD by using R Bioconductor minfi package with version 1.24.0. 10 According to the annotations provided by Illumina for the HumanMethylation450 platform (IlluminaHumanMethylation450kanno.ilmn12. hg19), only probes mapped uniquely to the human reference genome (hg19) were kept for analysis in this study. The methylated genes, which have significant differences termed biologically meaningful for an FDR q-value below 0.01, were further categorized into hypermethylation and hypomethylation subgroups, according to their mean value for the 18 LUAD patients higher than 0.7 or less than 0.3, respectively. The Methylation_450 value of hypermethylated and hypomethylated genes was used for heatmap figure with one additional scale normalization step, which subtracted the mean value from Methylation_450 then dividing by the standard deviation of Methylation_450 value.

| Differential expression analysis of transcripts
All the normalized gene expression RSEM data was transformed into RNA-seq read counts by tximport method. 11 Next, an existing method called DESeq2 was used to determine the transcripts that were differentially expressed between LUAD and adjacent normal tissues. We prepared an input matrix of RNA-seq read counts where rows were transcripts and columns were paired tumor-normal samples. The transcripts with absolute log fold changes (logFC) greater than 2 and FDR-corrected values less than 0.01 were termed to be differentially expressed. Similar to making the heatmap of methylation analysis, the heatmap of differentially expressed genes also utilized normalized scale value, which removing the mean value from normTransform (log2) DESeq2 followed with dividing by the standard deviation of normTransform (log2) DESeq2 value. 12

| Estimation of survival value
Kaplan-Meier curves, with P-values calculated via log-rank test was used to represent the survival distributions between "high" and "low" expression groups (defined by median value of each gene expression). Two-sided P values, which calculated by R survival package 13,14 lower or equal than 0.05 were considered statistically significant.

| Identifying CpG based on genomewide profiling
Given that methylation of CpG dinucleotides represents more than 98% of DNA methylation in mammalian somatic cells, 15 we focused on comparison of CpG sites methylation across the genome of LUAD tumors and normal tissue. From 636 patients in TCGA, a total of 18 pairs of LUAD tissues and matched healthy tissues were picked up and involved in present DNA methylation study (Figure 1). The methylation distribution for all paired patients abides by a bimodal distribution, with peaks around 0 (un-methylated) and 1 (methylated). The heat maps showed that hypermethylation of 2087 genes ( Figure 2A) and hypomethylation of 5416 genes ( Figure 2B) were identified for 18 pairs of tumor and adjacent normal tissue, based on the genomewide analysis of CpG methylation (P-value <0.01).

| Differentially expressed mRNAs in LUAD
RNA-seq data of the same 18 matched LUAD tumors and adjacent normal tissues were involved in the differential expression analysis. According to the cutoff criteria (|logFC|≥2, Padj <0.01), 1829 mRNAs were differentially expressed between LUAD and matched healthy tissues (Figures 1 and 3A). Among them, 696 genes were down-regulated and 1133 genes were upregulated. The results of expression analysis were presented as heat maps to demonstrate that the down-regulated ( Figure 3B) and up-regulated ( Figure 3C) genes in all 18 pair of patients.

| Selection of candidate genes for LUAD prognostic biomarkers
Up to date, 803 oncogenes and 1217 tumor suppressor genes were currently defined by human cancer studies (https://ongene.bioinfo-minzhao.org/ and https://bioinfo. uth.edu/TSGene). Together with aberrant methylation sites and differential expression genes as well as the known oncogenes or tumor suppressor genes, 32 genes were identified and classified into two groups, 20/32 genes in group I (hypomethylation-up-regulated-oncogenes), and 12/32 genes in group II (hypermethylation-down-regulatedtumor suppressor genes) (Figures 1 and 4A). The heatmaps of all 32 genes in paired tumor and normal tissues indicated that the negative association between aberrant situation of DNA methylation and mRNA expression level ( Figure 4B and C).
Furthermore, Kaplan-Meier plotter analysis was performed on these 32 genes for verifying the genes, which correlated with prognosis of LUAD. A total of 492 LUAD patients were stratify into two groups according to the median expression level were involved in this study. A univariate Cox proportional hazards regression analysis showed six genes (AURKA, BLK, CNTN2, HMGA1, PTTG1, and TNS4, Figure 5) from group I, four genes (DAPK2, MFSD2A, THSD1 and WNT7A, Figure 6) from group II were significantly associated with the survival of 492 LUAD patients. Surprisingly, among these genes, BLK, CNTN2 ( Figure 5E and F) and WNT7A ( Figure 6D) displayed reversed relationship between their expression level and survival rate of LUAD patients. Nevertheless, all these 10 candidate genes should be considered as notable prognostic biomarkers and promising therapeutic targets in further experimental research.

| DISCUSSION
With the high-mortality rate, LUAD is responsible for the majority of tumor-related deaths. However, increasing numbers of somatic mutations and genomic dysregulations have been discovered in LUAD, which makes the identification of the key driver gene alterations challenging. A number of studies have shown that DNA methylation is the second "motivation" of carcinogenesis after gene mutation and has become an important marker for early tumor diagnosis. 16,17 Compared to gene mutations, abnormalities of DNA methylation are more common in tumor genomes and reversible according to various factors such as genetic background, age, environment, diet, and behavior. In addition, it can dynamically influence the gene status and eventually lead to tumorigenesis. 18 With the advantages of new-generation sequencing technologies, the methylation status has been detected on the whole genome level. In this study, we identified 2087 hypermethylated genes and 5416 hypomethylated genes from 18 pairs of LUAD and control tissues.
The occurrence of tumors is a complex process regulated by genetic, environmental, and epigenetic factors, which results in significant individual differences of cancer patients. In clinical practice, individualized diagnosis is an important prerequisite for appropriate treatment. Epigenetic markers based on DNA methylation and mRNA expression is indispensable. Toyooka et al 19 found DNA methylation was ubiquitous in all stages of lung cancer development and negatively regulated the expression of oncogenes and tumor suppressor genes. Therefore, the combination of aberrant DNA methylation and abnormal mRNA expression as well as cancer-associated gene expression is of great significance for selection of diagnostic and prognostic molecular marker of LUAD. As the results, we found 20 oncogenes and 12 tumor suppressor genes were differentially expressed caused by abnormal DNA methylation in LUAD.
To further explore the correlation between expression of these 32 genes and survival rate of LUAD patients, we evaluated the prognostic values by univariable Cox regression analysis. Our study verified 10 significant prognostic genes at the epigenetic and transcriptomic levels. Among these 10 candidates, six have been reported as Lung cancer-related genes. It has been discovered AURKA was highly expressed in LUAD and played important roles in the cell cycle and apoptosis of human LUAD cells. 20 Chen et al 21 unraveled HMGA1, a NF-κB signaling related factor, can be regulated by miR-26 and associated with prognosis of LUAD patients. PTTG1, which regulates TGFβ1/SMAD3 signaling pathway, has been described a potential immunotherapeutic target for development and metastasis of LUAD. 22 As lung tumor suppressor genes, MFSD2A could inhibit cell cycle and matrix attachment of lung cancer cells, 23 DAPK2 was found to induce oxidative stress in A549 cells by regulation of mitochondrial function, 24 and overexpression of THSD1 could significantly reduce the colony-forming ability of A549 cells. 25 However, none of these six genes was studied on their DNA methylation level in LUAD. The aberrant DNA methylation, which was indicated in present study, may explain their abnormal expression in LUAD and indicate a novel strategy for renovation their expression to normal level.
TNS4 is was speculated as an oncogene in digestive tract cancers via direct interaction with phosphorylated MET, 26 but the potential oncogenic activity of TNS4 in LUAD was firstly suggested in present study. In addition, there were three genes, BLK, CNTN2, and WNT7A displayed contradictory association of their expression and survival rate of 492 patients in our study. A variety of genes were expressed conversely in various tumors, even not unaltered during the development of individual cancer. Yan et al 27 revealed the protein level of CNTN2 was higher in high-grade glioma cells and tissues and lower in low-grade glioma cells and tissues. Also, highly expressed genes in tumor tissue may play anti-tumor functions through activation immune cells or pathways. Very recently, one study discovered overexpressed SPON2 in hepatocellular carcinoma could promote macrophages recruitment and prevent metastasis of hepatocellular carcinoma cells. 28 Strikingly, WNT7A was down regulated, via promoter methylation, in many NSCLC cell lines and tissues, 29 which in accordance with the results of DNA methylation and mRNA expression analysis. However, higher expression level of WNT7A associated with worse prognosis was unexpected and needed to verify by further prognosis analysis or biological experiments, which may discover a new mechanism of how WNT7A affect the progression and prognosis of LUAD.
In present study, the DNA methylation and gene expression status between LUAD and normal tissues was identified for the first time. Together with the survival rate analysis, 10 candidates were highlighted. Although the exact functions of these genes are still unknown. The majority of them could be treated as useful and practical biomarkers to improve prognostic value and survival prediction of LUAD, as well as novel applications for appropriate clinical adjuvant testing.
However, there are some limitations may cause few potential genes undiscovered in 803 known oncogenes. In the Tables S1 and S2 listed additional 13 genes' expression were nicely correlated with the prognosis of 492 LUAD patients (P < 0.001). And most of them had already reported as potential oncogenes, which may contribute the development of LUAD or LUSC. The first limitation is the research literatures about some potential oncogenes were published later than 25th December 2015, which is the deadline of 803 known oncogene collected from the systematic search in PubMed. For example, ARNTL2 could enable LUAD self-sufficient