DNA methylation profiling to predict recurrence risk in stage Ι lung adenocarcinoma: Development and validation of a nomogram to clinical management

Abstract Increasing evidence suggested DNA methylation may serve as potential prognostic biomarkers; however, few related DNA methylation signatures have been established for prediction of lung cancer prognosis. We aimed at developing DNA methylation signature to improve prognosis prediction of stage I lung adenocarcinoma (LUAD). A total of 268 stage I LUAD patients from the Cancer Genome Atlas (TCGA) database were included. These patients were separated into training and internal validation datasets. GSE39279 was used as an external validation set. A 13‐DNA methylation signature was identified to be crucially relevant to the relapse‐free survival (RFS) of patients with stage I LUAD by the univariate Cox proportional hazard analysis and the least absolute shrinkage and selection operator (LASSO) Cox regression analysis and multivariate Cox proportional hazard analysis in the training dataset. The Kaplan‐Meier analysis indicated that the 13‐DNA methylation signature could significantly distinguish the high‐ and low‐risk patients in entire TCGA dataset, internal validation and external validation datasets. The receiver operating characteristic (ROC) analysis further verified that the 13‐DNA methylation signature had a better value to predict the RFS of stage I LUAD patients in internal validation, external validation and entire TCGA datasets. In addition, a nomogram combining methylomic risk scores with other clinicopathological factors was performed and the result suggested the good predictive value of the nomogram. In conclusion, we successfully built a DNA methylation‐associated nomogram, enabling prediction of the RFS of patients with stage I LUAD.


| INTRODUC TI ON
Lung cancer is one of the cancers with the leading cause of cancer-related death worldwide. 1 The majority of lung cancer is nonsmall cell lung cancer (NSCLC). NSCLC is divided into three main subtypes including lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LSCC) and large-cell carcinoma. 2 At present, LUAD is the most common histological subtype of lung cancer. 3 The prognosis of patients with lung cancer is significantly associated with different TNM clinical stages. Early-stage (IA-ⅡB) NSCLC accounts for only 25%-30% of all lung cancers. 4 Surgery remains the primary treatment for operable and resectable stage I LUAD.
However, about 20% of patients with stage I LUAD develop cancer recurrence after surgery. 5 Therefore, an effort to identify effective biomarkers for prognosis of stage I LUAD is urgently required.
It has been revealed that genes controlled by DNA methylation were relevant to tumour development. 6,7 Numerous researches reported that DNA methylation may serve as potential prognostic biomarkers. For example, Guo et al reported that a five-DNA methylation signature served as a novel prognostic biomarker in patients with ovarian serous cystadenocarcinoma. 8 Sailer et al suggested that intragenic DNA methylation of PITX1 and the adjacent long non-coding RNA C5orf66-AS1 functioned as prognostic biomarkers in patients with head and neck squamous cell carcinomas. 9 Sailer et al revealed that PITX2 DNA methylation may serve as a prognostic biomarker in patients with head and neck squamous cell carcinoma. 10 Uhl et al indicated that DNA methylation of PITX2 and PANCR served as prognostic for overall survival in patients with resected adenocarcinomas of the biliary tract. 11 DNA methylation was relevant to carcinogenesis by inhibiting the expression of the tumour suppressor gene and enhancing the expression of oncogenes. [12][13][14][15] Thus, the cancer tissues have a more remarkable DNA methylation pattern than that in normal tissues. In addition, DNA methylation patterns belong to inherently reversible changes and thus may be potential targets for drug therapy. 16 Therefore, investigations on DNA methylation are promising in identifying predictive biomarkers for treatments and may help offer individualized treatments and prolong patients' survival time.
However, the utility of genome-wide methylation analysis in clinical practice is restricted by the large sets of DNA methylation determined and the difficulties in complicated statistical analyses. In addition, the stability of prognostic methylation marker identified is restricted by different samples and the lack of regulation for primary confounding factors. 17 Therefore, the whole-genome methylation profiles of tumour tissues from patients with stage I LUAD were obtained from TCGA database and GEO database and a predictive risk model for RFS according to methylation of DNAs was established and examined via a bioinformatics approach in this study.

| Data processing, normalization and identification of differentially expressed methylation sites
Pre-processing the data before constructing the prediction model was essential. Methylation sites whose beta value was not available (NA) in any specimens were excluded from our study. Then, we normalized the data with 'betaqn' function from wateRmelon package. 21 Furthermore, all the patient specimens were separated into recurrent group and no recurrent group based on recurrence status.

| Statistical analyses
Relapse-free survival was defined as the time from the beginning of treatment to the earliest local recurrence, distant metastasis and death.
The univariate Cox proportional hazard analysis was acted in the training dataset to determine methylation sites significantly (P < .01) relevant to patient RFS as potential indicators. Then, the potential indicators were used to perform the LASSO Cox regression analysis for further identifying the candidate factors influencing the RFS of patients. Subsequently, the identified candidate markers were used as covariates to establish multivariate Cox proportional hazard model. Eventually, a 13-DNA methylation signature was identified for predicting prognosis of stage I LUAD. Then, AUC was applied to weigh the model performance with the 'survivalROC' package. A formula was constructed to measure RFS risk scores for every patient on the basis of the model. Patients with stage I LUAD were separated into high-and low-risk group with the median score as the cut-off. Kaplan-Meier survival analysis was executed to weigh the differences in RFS between the two cohorts, and Kaplan-Meier curves were drawn via the 'survival' package. 23

| Construction of the nomogram
To improve the quality with a quantitative tool, we developed a nomogram on the basis of the 'rms' R package. 24 The univariate Cox proportional hazard analysis and multivariate Cox proportional hazard analysis were performed based on methylation risk score and other clinicopathological factors. The factors with P ≤ 0.05 from multivariate Cox proportional hazard analysis were used to construct nomogram. Hazard ratios (HR) and corresponding 95% confidence interval (CI) were evaluated according to Cox proportional hazard models. The prognostic ability of the nomogram was weighed by C-index, ROC and calibration plots.

| Clinical characteristics of the study populations
The study was performed on 268 TCGA patients and 118 GEO patients who were clinically and pathologically diagnosed with stage I LUAD.
Of these TCGA patients, 111(41.42%) were male and 157(58.58%) were female. The median age at diagnosis was 66 years (range, 33-88), respectively, and the median RFS was 595. In addition, smoking history of stage I LUAD patients included smoking group, no smoking group and not available group. Smoking group was the most common type 173(64.6%). The demographic characteristics of stage I LUAD patients in TCGA dataset as well as GEO dataset were summarized in Table 1, and the overall design and flowchart of this study were displayed in Figure 1.

| Identification of 13 methylation site signature
2372 differentially expressed methylation sites were determined between recurrence and no recurrence groups and were used for univariate Cox proportional hazard regression model, and a total of 530 DNA methylation sites were revealed to be significantly associated with the RFS of stage I LUAD patients (P < 0.01) (Table S1). Then, LASSO Cox regression model was acted on these 530 DNA methylation sites and 25 methylation sites were identified as the candidate prognostic indicators for predicting RFS of stage I LUAD patients (Figure 2A,B). with stage I LUAD were separated into high-and low-risk group with the median risk score as the cut-off, patients were ranked on the basis of their risk scores ( Figure 2C), and the dotplot was drew via their recurrence status ( Figure 2D). Result showed that the low-risk group had a longer RFS than the high-risk group. Heatmap of 13 methylation sites classified by risk score was shown in Figure 2E, which was corresponding to our previous boxplot ( Figure S2).

| Correlation between 13-DNA methylation signature and patients' RFS in the internal validation and external validation datasets as well as entire TCGA dataset
To measure the differences in RFS between the two groups. The

| Identification of the 13-DNA methylation signature-associated biological pathways
Single-sample Gene Sets Enrichment Analysis (ssGSEA) was conducted on TCGA LUAD mRNA dataset by using GSVA package 25 for determination of the 13-DNA methylation signature-associated signalling pathways. The patients were divided into low-or high-risk cohorts based on the median methylation score. A few of top 20 pathways including vantveer breast cancer poor prognosis, Xu hgf signature not via AKT1 48HR and vantveer breast cancer metastasis were markedly more activated in the high-risk patients than that in low-risk patients ( Figure 5A). The trend of the pathways was consistent with the risk score. The relevance of between the risk score and the pathways was further evaluated through correlation analysis. The outcome demonstrated a robust correlation between them ( Figure 5B).

| Comparison with other known gene signatures
A comparison of our nomogram and signature with other known prognostic hallmarks was performed to assess the robustness of which was distinctly higher than that of other biomarkers. [26][27][28][29][30][31][32] The larger the AUC value of a biomarker, the better the predictive ability of the hallmark, which made it clear that our nomogram as well as

F I G U R E 4
Kaplan-Meier and ROC analysis of patients with stage I LUAD in internal validation and external validation datasets as well as entire TCGA dataset. A, C and E, Kaplan-Meier analysis with two-sided log-rank test was performed to estimate the differences in RFS between the low-risk and high-risk group patients. B, D and F, 1-, 3-and 5-year ROC curves of the 13-DNA methylation signature were used to demonstrate the sensitivity and specificity in predicting the RFS of stage I LUAD patients. 'High' and 'Low' represent the high-risk score group and low-risk score group, respectively. The median risk score was taken as a cut-off methylation signature outperformed other signatures in predicting stage I LUAD patients' prognosis. for predicting a poor prognosis of hepatocellular cancer. 35 The methylation of DFNA5 yielded strong potential as a prognostic hallmark for breast carcinoma. 36 In the present study, we analysed the   40 Liu et al reported that expression level of NUAK1 played a significant role in the prognosis of human nasopharyngeal carcinoma. 41 The result demonstrated that the 6 genes associated with these 13 sites played important roles in cancer progression.

| D ISCUSS I ON
To further explore the predictive ability of our nomogram, a comparison was performed among several significant molecular signatures which were employed for predicting prognosis in stage I LUAD.
As there are few studies discovering signatures for predicting RFS of stage I LUAD, the studies for all stages or early-stage LUAD patients' prognosis also included in our comparison. The AUCs of the nomogram and the signature in our study were remarkably larger than that of other molecular signatures, indicating that our markers outperformed other hallmarks. In particular, the AUC of the nomogram is greater than that of the signature in our study, suggesting that the combination of the risk score with clinical factors is more promising than the methylation signature alone in predicting the RFS of stage I LUAD patients' prognosis. and multi-platform studies should be conducted to confirm these findings before the application of our nomogram for RFS prediction of stage I LUAD.

ACK N OWLED G EM ENTS
This study was supported by the National Natural Science Foundation of China (Grant number: 81874184 and 81402357).
F I G U R E 7 ROC curves show the sensitivity and specificity of the methylation-associated nomogram and other known biomarkers in predicting the prognosis of stage I LUAD patients