Epigenetic–smoking interaction reveals histologically heterogeneous effects of TRIM27 DNA methylation on overall survival among early‐stage NSCLC patients

Among early‐stage non‐small‐cell lung cancer (NSCLC) patients, cg05293407TRIM27 was significantly and exclusively associated with survival of lung squamous cell carcinoma patients, who had higher smoking intensity compared to lung adenocarcinoma patients. Generally, the significant association between cg05293407TRIM27 and survival only remained in NSCLC patients having medium‐to‐high pack‐year of smoking. The cg05293407TRIM27‐smoking synergistic interaction might account for histologically heterogeneous effects of TRIM27 DNA methylation on NSCLC survival.

Tripartite motif containing 27 (TRIM27) is highly expressed in lung cancer, including non-small-cell lung cancer (NSCLC). Here, we profiled DNA methylation of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) tumours from 613 early-stage NSCLC patients and evaluated associations between CpG methylation of TRIM27 and overall survival. Significant CpG probes were confirmed in 617 samples from The Cancer Genome Atlas. The methylation of the CpG probe cg05293407 TRIM27 was significantly associated with overall survival in patients with LUSC (HR = 1.65, 95% CI: 1.30-2.09, P = 4.52 9 10 À5 ), but not in patients with LUAD (HR = 1.08, 95% CI: 0.87-1.33, P = 0.493). As incidence of LUSC is associated with higher smoking intensity compared to LUAD, we investigated whether smoking intensity impacted on the prognostic effect of cg05293407 TRIM27 methylation in NSCLC. LUSC patients had a higher average pack-year of smoking

Introduction
Lung cancer is the most commonly diagnosed cancer, accounting for 11.6% of total cases in 2018 [1]. More than 85% of lung cancer cases are non-small-cell lung cancer (NSCLC), of which lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the most common subtypes [2,3]. Compared to late-stage patients, NSCLC patients diagnosed at an early stage have a better prognosis [4]. However, wide heterogeneity in overall survival has been observed even within the same stage of cancer, indicating the possible existence of prognosis-influencing molecular mechanisms [5]. Epigenetic alterations such as DNA methylation are considered important representatives of these molecular mechanisms [6].
Tripartite motif containing 27 (TRIM27) is highly expressed in lung cancer and plays an important role in cancer prognosis [7,8] by encoding a member of the tripartite motif (TRIM) family. TRIM family proteins play crucial roles in a wide range of processes, including cell growth, apoptosis and stem cell differentiation [9]. TRIM27 is an oncogene of various tumour types, including colitis-associated cancer, salivary gland intraductal carcinoma, colon cancer, uterus cancer and prostate cancer [7,10,11]. Further, originally identified to be involved in oncogenic rearrangements with the transfection proto-oncogene (RET), TRIM27 is also known as RFP (RET finger protein) [12]. RET rearrangements were implicated in NSCLC [13].
Furthermore, DNA methylation changes have been linked to various environmental exposures (e.g., cigarette smoking) and may explain part of the association between smoking and cancer recurrence and mortality [15,27,28]. However, LUAD is more common in nonsmokers and long-term former smokers, while most NSCLC patients among current smokers have LUSC [29,30], indicating substantially different pathology and oncology. Anyway, few study focused on heterogeneous effect of DNA methylation between LUAD and LUSC.
Therefore, we utilized a two-stage design to identify NSCLC prognosis associated epigenetic biomarkers in TRIM27 and further explored the potential reason of heterogeneous effect of biomarkers across histology by performing epigenetic-smoking interaction analysis. Meanwhile, the robustly significant biomarkers were investigated for the associated alterations in gene expression which were also studied for effect on lung cancer survival.

Study populations
We collected data from early-stage (stage I and II) NSCLC patients from five international study centres. Cases from the Harvard, Spain, Norway and Sweden cohorts were assigned into the discovery phase [31][32][33][34], while cases from The Cancer Genome Atlas (TCGA) were assigned into the validation phase. All patients provided written informed consent. The study methodologies conformed to the standards set by the Declaration of Helsinki and was approved by the local ethics committee.

Harvard
The Harvard Lung Cancer Study cohort was described previously [31]. Patients were recruited at Massachusetts General Hospital (MGH) since 1992. All were newly diagnosed and histologically confirmed as primary NSCLC at the time of recruitment. Snapfrozen tumour samples were taken from patients during complete resection. The 151 early-stage patients selected in this study had complete survival information. Tumour DNA was extracted from 5-lm-thick histopathologic sections. Each specimen was evaluated by an MGH pathologist for amount (tumour cellularity > 70%) and quality of tumour cells. All specimens were histologically classified using World Health Organization criteria.

Spain
The study population was described previously [32]. Tumours were collected by surgical resection from 226 patients. DNA extraction was performed on tumour specimens (10 lm thick, tumour cellularity > 50%). The study was approved by the Bellvitge Biomedical Research Institute Institutional Review Board.

Norway
Participants were 133 LUAD patients with operable lung cancer tumours seen at Oslo University Hospital, Rikshospitalet, Norway, in 2006-2011 [33]. Tumour tissues were collected during surgery, snap-frozen in liquid nitrogen, and stored at À80°C until DNA isolation. All early-stage patients did not receive chemotherapy or radiotherapy before surgery. The project was approved by the Oslo University Institutional Review Board and the Regional Ethics Committee (S-05307).

Sweden
Tumour tissue samples were collected from 103 patients with early-stage NSCLC who underwent operation at Skane University Hospital, Lund, Sweden [34]. The study was approved by the

Quality control procedures for DNA methylation data
For each patient, DNA methylation was assessed using Infinium HumanMethylation450 BeadChips (Illumina Inc., San Diego, CA, USA). All centres followed the same quality control (QC) procedures before conducting the association study. GenomeStudio Methylation Module V1.8 (Illumina Inc.) was used to convert raw image data into beta values (continuous numbers ranging 0-1) for background subtraction and control normalization. Unqualified probes meeting any one of the following criteria were excluded: (a) failed detection (P > 0.05) in > 5% of samples; (b) coefficient of variance of < 5%; (c) all samples methylated or unmethylated; (d) common single nucleotide polymorphisms located in the probe sequence or 10-bp flanking regions; (e) cross-reactive probes or cross-hybridizing probes [35]; or (f) did not pass QC in all centres. Samples with > 5% undetectable probes were excluded. Methylation signals were further processed for quantile normalization (betaqn function in R package minfi) as well as type I and II probe correction (BMIQ function in R package lumi). Data were adjusted for batch effects (ComBat function in R package sva) according to the best pipeline by a comparative study [36]. Details of QC processes are described in Fig. S1.

Gene expression data
In TCGA cohort, all of the 281 LUAD and 277 LUSC cases had complete mRNA sequencing data. Gene expression was measured by RNA sequencing. Data processing and QC were done by TCGA workgroup. Raw counts were normalized by RNA-seq expectation maximization. Level-3 gene quantification data were downloaded from TCGA and were further checked for quality. Expression of TRIM27 was extracted and log 2 -transformed before analysis.

Statistical analysis
The study design is shown in Fig. 1. To investigate the association between DNA methylation of TRIM27 and overall survival, we applied a Cox proportional hazards model adjusted for age, sex, smoking status, clinical stage and study centre for LUAD and LUSC patients, respectively. Proportional hazards assumption for each CpG probe was also tested. Hazard ratio (HR) and 95% confidence interval (CI) were with respect to per 1% level of methylation increment. Multiple comparisons were adjusted by using false discovery rate method (FDR; measured by FDR-q value) [37] to control the overall false-positive rate at 5% level. CpG probes with FDR-q ≤ 0.05 in the discovery phase were further replicated in the validation phase. Robustly significant CpG probes were finally retained if they met the following criteria: (a) P ≤ 0.05 in validation phase and (b) consistent effect direction across two phases. For robustly significant CpG probes, Kaplan-Meier curves were used to compare survival difference between patients with different methylation levels.

Methylation-smoking interaction analysis
We observed a significant heterogeneous effect of cg05293407 TRIM27 across histology, but the distributions of cg05293407 TRIM27 methylation in LUAD and LUSC patients were comparable. Meanwhile, combined prior literature information with our results, all evidence indicated that heavy smoking was relevant to LUSC. Therefore, we hypothesized that this heterogeneity might be explained by methylation-smoking interaction, which was further tested as a product term (methylation and pack-year of smoking) in a Cox  proportional hazards model adjusted for same covariates as aforementioned.

Genome-wide methylation-transcription analysis
Based on hypothesis of omnigenetic model [38], for these identified prognostic CpG probes, we used a linear regression model adjusted for the aforementioned covariates to test the association between DNA methylation and gene expression using transcriptomic data from TCGA. Significant genes were identified as FDRq ≤ 0.05 and presented in Circos plot. Then, the association between gene expression and overall survival was further evaluated using Cox models adjusted for the same covariates. Genes significantly associated with both methylation and NSCLC survival were screened out. Continuous variables were expressed as mean AE standard deviation (SD), and categorical variables were expressed in frequency (n) and proportion (%). Statistical analysis was performed using R version 3.5.2 (The R Foundation of Statistical Computing, Tsinghua University, Beijing, China).
However, distribution of cg05293407 TRIM27 methylation in LUAD and LUSC patients was similar (Fig. 4A) and comparable (P = 0.518) by Wilcoxon rank-sum test (Fig. 4B). Since nonsmokers and longterm former smokers are more common in LUAD patients, while the majority of lung cancer patients who are current smokers have LUSC [29]. Therefore, we assumed that there might exist a methylationsmoking interaction accounting for the heterogeneous effect of cg05293407 TRIM27 on NSCLC survival across histology. The smoking-related variables were compared between the LUAD and LUSC patients (Table S6). Compared with LUAD patients, LUSC patients had more pack-year of smoking averagely (37.49 LUAD vs 54.79 LUSC , P = 1.03 9 10 À19 ) (Fig. 4C, D) and a higher proportion of current smokers (28.24-% LUAD vs 34.09% LUSC , P = 0.037) (Table S6).
We identified a significant interaction between cg05293407 TRIM27 and pack-year of smoking in all NSCLC patients (HR interaction = 1.01, 95% CI: 1.00-1.02, P = 0.034). With increased pack-year of smoking, there was an elevated risk for high methylation of cg05293407 TRIM27 on NSCLC survival (Fig. 5). Therefore, pack-year of smoking was a modifier of the association between cg05293407 TRIM27 and NSCLC survival.
We also evaluated joint effect of cg05293407 TRIM27 methylation level and pack-year of smoking on NSCLC survival (Table 2). Patients were categorized into three groups (high, medium and low) by tertiles of cg05293407 TRIM27 methylation level (1.33% and 1.78%) and were also categorized into three groups (high, medium and low) by cut-off values of pack-year of smoking (39 and 54). Only for these patients having > 39 pack-year of smoking, cg05293407 TRIM27 was a significant risk factor (Fig. 5). Therefore, 39 was defined as a cut-off value of low and medium-high levels. Further, the median value (54) of pack-year of smoking for LUAD patients having > 39 pack-year of smoking was used to distinguish medium and high levels. We used the best prognosis group (low-medium methylation of cg05293407 TRIM27 and low-medium pack-year of smoking) as the reference to evaluate effects of high methylation level, high pack-year of smoking and their joint effect, as well as interaction.
To illustrate the modification effect by pack-year of smoking, effect of cg05293407 TRIM27 on NSCLC survival was evaluated in patients with low, medium and high levels of pack-year of smoking. The effect of cg05293407 TRIM27 varied across patients with different pack-year of smoking. For LUAD patients with a high level of pack-year of smoking, high methylation of cg05293407 TRIM27 had significantly worse survival (HR High vs Low = 1.88, 95% CI: 1.07-3.32, P = 0.029) (Fig. 6A,B). In LUSC and overall NSCLC patients, we observed similar significant results in patients with both medium (HR High vs Low = 2.51, 95% CI: 1.10-5.73, P = 0.029 in LUSC patients; HR High vs Low = 1.89, 95% CI: 1.17-3.06, P = 9.35 9 10 À3 in overall patients) and high (HR High vs Low = 2.55, 95% CI: 1.43À4.55, P = 1.49 9 10 À3 in LUSC patients; CI: 1.32-2.84, P = 7.51 9 10 À4 in overall patients) levels of pack-year of smoking (Fig. 6C-F). Our results indicated that cg05293407 TRIM27 influenced NSCLC survival actually regardless of histology, but only among these patients exposed to relatively heavy smoking. Since the packyear of smoking might be bimodal distributed due to plenty of zero values from never smokers, we also performed sensitivity analysis by testing the methylationsmoking interaction in NSCLC patients excluding never smokers and still observed the significant interaction (Fig. S2) and same pattern (Fig. S3). Another sensitivity analysis based on smoking status also indicated an upward trend (P Trend = 0.022) in effect size of cg05293407 TRIM27 from never smokers (HR = 0.89), former smokers (HR = 1.23) to current smokers (HR = 1.88) in overall population, even not taking pack-year of smoking into account (Fig. S4).

Discussion
We performed a two-stage study and integrative analysis of DNA methylation of TRIM27 and gene expression in early-stage NSCLC patients. The CpG probe, cg05293407 TRIM27 , located at the 200 kb transcription start site (TSS) region of TRIM27, was identified as an exclusive biomarker of early-stage LUSC prognosis. Further, the heterogeneous effect of cg05293407 TRIM27 across histology may be explained by a methylationsmoking interaction.
As LUAD and LUSC differ in the origin and histology, the mechanism of occurrence and progression may be different at a molecular level [39,40]. For example, both mutated genes and recurrent somatic copy number alterations are largely distinct between the two NSCLC types [41]. We only observed one probe, cg05293407 TRIM27 , exclusively associated with early-stage LUSC prognosis in stratified analysis by histology, whereas no promising CpG probes were observed for LUAD, possibly due to underlying epigenetic heterogeneity between LUAD and LUSC. Further, LUSC is more strongly associated with smoking than LUAD, suggesting different causes for their induction as well [30]. In addition, a methylationsmoking interaction may potentially provide interpretation of the heterogeneous effect of cg05293407 TRIM27 .
The tumour-specific shift to transcriptional repression is associated with DNA methylation at TSSs in multiple tumour types [42]. Generally, hyper-methylation blocks transcription initiation and reduces gene expression [43]. However, a small proportion of methylation surrounding the TSS region upregulates gene expression, indicating that DNA methylation regulation may be more complex [44]. In our study, DNA methylation at cg05293407 in the 200 kb TSS region of TRIM27 upregulated gene expression in tumour tissues, which was consistent with previous reports [45,46]. This phenomenon may be mediated by affecting the binding activity of upstream transcription factors [47]. However, further functional studies are warranted to elaborate the possible mechanism.
In LUSC patients, the methylation level of cg05293407 TRIM27 ranged from 0.62% to 4.09% and its median value was 1.48%, indicating a narrow range and low average level. As shown in Fig. S7, these maximum values of all 311 891 CpG probes followed a bimodal distribution with the first peak around 5%. Furthermore, there were 9341 (2.99%) CpG probes with even lower maximum value than that of cg05293407 TRIM27 indicating its narrow range was reasonable. Meanwhile, plenty of studies have revealed that aberrant DNA methylations of these hypo-methylated CpG probes were also involved in diseases (e.g., female panic disorder risk associated cg07308824 HECA and paediatric medulloblastoma prognosis associated cg02257300 ERCC2 ) [48,49].
TRIM27 belongs to the TRIM family, an extended family of proteins with a common denominator of a tripartite combinatorial motif encompassing RING finger, B-box, and coiled-coil domain homologies [50]. TRIM27 is an important positive regulator of signal transducer and activator of transcription 3 (STAT3) activation. TRIM27, located at retromer-positive structures, can recruit STAT3 after IL-6 stimulation and lead to improved STAT3 activation [11]. STAT3 activity plays important roles in pathogenesis of many cancers, including breast, head and neck, prostate and brain cancers [51]. Further, STAT3 is overexpressed in NSCLC tumour samples, and sorafenib can inhibit   STAT3 activation to produce anticancer effects in NSCLC [52]. Combined with our results, these data suggested that high methylation of cg05293407 TRIM27 might promote TRIM27 expression, further leading to STAT3 activation and poor prognosis (Fig. 8).
Smoking is associated with several genetic alterations in NSCLC [53] and has been well-established as a relevant factor of lung cancer risk as well as prognosis [15]. Cigarette smoke contains reactive oxygen species (ROS), which inhibit phosphatase and tensin homolog (PTEN) expression by phosphorylating the ROS-dependent Src/EGFR-p38MAPK pathway [54]. PTEN inhibits glycolysis in brain tumour cells by directly interacting with phosphoglycerate kinase 1 (PGK1) [55]. Further, PTEN inhibits cancer cells by moderating signalling through the PI3K pathway. PTEN is lowly expressed in NSCLC tumour samples and is more prevalent in LUSC [56]. Therefore, for patients with high pack-year of smoking, heavy exposure to cigarette smoking may strongly inhibit PTEN expression through ROS and relate to poor NSCLC prognosis (Fig. 8).
Moreover, PTEN is an essential modulator of STAT3-mediated pathways. Although STAT3 is a downstream target of PTEN, STAT3 also reversely inhibits PTEN expression by directly activating miR-21, which is part of the epigenetic switch linking inflammation to cancer [57]. Therefore, STAT3 activation can downregulate PTEN expression (Fig. 8). In terms of the cg05293407 TRIM27 and smoking interaction, high methylation was associated with poor prognosis in NSCLC patients with medium-high pack-year of smoking rather than low pack-year of smoking, possibly because high activation of STAT3 and low expression of PTEN may only occur in patients with medium-high methylation of cg05293407 TRIM27 . . HR, 95% CI, and P value were derived from a Cox proportional hazards regression model adjusted for age, sex, smoking status, clinical stage and study centre. P Heterogeneity was used to evaluate heterogeneity of HRs across groups.
We observed three genes associated with cg05293407 TRIM27 : GJC3, NAALAD2 and USP26. GJC3 is one of the genes coding for connexin (CX) proteins and is reported to be associated with nonsyndromic hearing loss [58]. Further, patients with low GJC3 expression had a better prognosis in our study. NAALAD2 encodes human prostate-specific membrane antigen (PSM), which is a marker of prostatic carcinomas and is the first shown to possess NAALADase activity [59]. Similarly, LUSC patients with lower NAALAD2 expression had higher survival in our study. USP26 is associated with Sertoli cell-only syndrome and male infertility in both European and Chinese men [60,61]. Further, our study showed consistent results in LUSC patients. Although these three genes lack explicit evidence of association with LUSC, their relationship to cg05293407 TRIM27 and LUSC survival may inspire functional studies of these potential genes and further help elucidate the mechanistic pathway of cg05293407 TRIM27 on LUSC survival.
Our study has several strengths. First, to our knowledge, this is the first multicentre study of interaction between DNA methylation of TRIM27 and smoking, which attempted to interpret the effect of DNA methylation that varied by NSCLC histology. Second, besides the significant statistical interaction observed on a population level, we experimentally elaborated on a plausible functional interaction between two pathways based on literature evidence. Third, by controlling false positives, our two-stage study and the sensitivity analysis provided robustness to our results. Fourth, we performed integrative analysis of DNA methylation and gene expression and systematically evaluated associated genes of cg05293407 TRIM27 on genome-wide scale.
We also acknowledge some limitations. First, though three genes associated with cg05293407 TRIM27 further affected lung cancer prognosis in our study, there was no explicit evidence of their mechanisms. Therefore, these associations should be interpreted with caution. Second, with a high censoring rate in the TCGA cohort, the statistical power might be limited. Anyway, the association between cg05293407 TRIM27 and prognosis remained significant in TCGA, indicating our results were conservative and roust. Third, the positive association between cg05293407 TRIM27 and TRIM27 expression was not reported by the other literatures yet. Further functional experiments are warranted to confirm our results. Finally, as the majority of our population was Caucasian (89.19%),

Conclusion
In summary, our study identified cg05293407 TRIM27 as a potential biomarker for LUSC prognosis and laid out a case that the methylation-smoking interaction may account for heterogeneous effects of cg05293407 TRIM27 across histology. Our findings provide a potential dynamic and reversible therapeutic target for NSCLC patients.

Data accessibility
The DNA methylation image data of Harvard, Spain, Norway and Sweden study cohort can be requested from DCC, ME, AH and JS, respectively. Alternatively, it can be retrieved from gene expression omnibus database (GSE39279, GSE66836 and GSE56044). TCGA: https://tcga-data.nci.nih.gov; now hosted at GDC: https://portal.gdc.cancer.gov.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Quality control processes for DNA methylation chip data. Fig. S2. Methylation-smoking interaction on survival of LUSC patients excluding never smokers. and P value were derived from a Cox proportional hazards regression model adjusted for age, sex, smoking status, clinical stage, and study centre. P Heterogeneity was used to evaluate heterogeneity of HRs across groups. Fig. S4. Kaplan-Meier overall survival (OS) curves of LUAD, LUSC and overall NSCLC patients. (A,B) LUAD patients, (C,D) LUSC patients and (E,F) overall patients. Hazard ratio (HR) and P value were derived from a Cox proportional hazards regression model adjusted for age, sex, clinical stage, pack-year of smoking and study centre. P Trend was used to evaluate trend of HRs across groups. Fig. S5. Kaplan-Meier overall survival (OS) curves of TCGA cases by low or high TRIM27 expression. The gene expression divided into low and high groups by median value (10.26). Hazard ratio (HR) and P value were derived from a Cox proportional hazards regression model adjusted for age, sex, smoking status, clinical stage, and study centre.   Table S1. Demographic and clinical characteristics of early-stage NSCLC patients with gene expression data derived from TCGA.   .  Table S3. Results of association analysis of 96 DNA methylation probes of TRIM27 in LUAD samples. Table S4. Results of association analysis of 96 CpG probes of TRIM27 in LUSC samples. Table S5. Results of proportional hazards test for 96 CpG probes of TRIM27 in LUSC samples. Table S6. Comparison of smoking-related characteristics of former and current smokers between early-stage LUAD and LUSC. Table S7. Results of genome-wide methylation transcription analysis of 29 genes significantly associated with cg05293407 in TCGA LUSC samples.