Analysis of the clinicopathological characteristics, genetic phenotypes, and prognostic of pure mucinous adenocarcinoma

Abstract Background Primary pure mucinous adenocarcinoma of the lung (PMA) is a rare subtype. However, correlations between clinicopathological features and genetic phenotypes with survival have not been described comprehensively. Methods Pure mucinous adenocarcinoma patient information collected from the Surveillance, Epidemiology, and End Results (SEER) database, the Department of Thoracic Surgery, Zhongshan Hospital, Fudan University (FDZSH), and the Cancer Genome Atlas (TCGA) were extracted, evaluated, and compared with other lung adenocarcinomas (LUAD) patient data. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses were performed to explore the functional importance of underlying molecular changes. Overall survival (OS) was evaluated with the Kaplan‐Meier method. Univariate and multivariate analysis through Cox proportional hazard regression identified risk factors that predicted OS, and the results were used to construct a nomogram to predict OS for PMA patients. Results Overall, 3622 patients, 41 patients, and 15 patients with PMA were identified from the SEER, FDZSH, and TCGA databases, respectively. There were 345 differentially expressed genes, 30 differentially mutated genes and 72 differentially methylated genes were identified between PMA and other LUAD samples. In the SEER database, PMA had a better prognosis compared to other LUAD. Compared with patients with other LUAD, patients with PMA exhibited unique clinicopathological features, including fewer grade III/IV tumors, less pleural invasion, more early‐stage cancer, and more lower lobe carcinomas. Multivariate analyses showed that age, race, T stage, N stage, surgery, and chemotherapy were independent risk factors. The nomogram had a calibration index of 0.724. Conclusions Our research identified unique clinicopathological characteristics and genetic phenotypes for PMA and other LUAD. The nomogram accurately predicted OS.


| INTRODUCTION
Lung cancer is the leading cause of cancer-related death worldwide and is responsible for over 1 700 000 cases every year. 1,2 Primary mucinous adenocarcinoma of the lung is a rare subtype that includes pure mucinous adenocarcinoma (PMA) and mixed mucinous adenocarcinoma. Pure mucinous adenocarcinoma is defined by the World Health Organization as lung tumor cells comprising goblet and/or columnar cells secreting abundant extracellular mucin and making up more than 50% of the tumor volume; it accounts for only 2%-5% of all lung adenocarcinomas. 3,4 Several previous reports have stated that the clinicopathological features and prognosis of PMA are unique from those of other histopathological types of lung adenocarcinoma. 4 However, most of the previous studies of PMA are limited.
In our study, we reviewed the clinicopathological features, genetic phenotypes, and survival of approximately 4000 PMA patients that we collected from the population-based the Surveillance, Epidemiology, and End Results (SEER) database, the Cancer Genome Atlas (TCGA) database, and cases treated in our own department. We aimed to better understand the correlations between the clinicopathological features and genetic phenotypes of PMA with survival.

| Ethical statement
Ethics approval was granted by the Ethics Committee of Zhongshan Hospital, Fudan University (Shanghai, China) (B2018-106).

| Patient selection
Previous studies have suggested that PMA may have clinical features and genomic characteristics distinct from those of other lung adenocarcinoma (LUAD). [4][5][6] Given that mixed mucinous adenocarcinoma is more complicated and has a lower incidence, mixed mucinous adenocarcinoma was not included in our study.

| Clinicopathological characteristics
We collected the following information for each patient from the databases: (a) the characteristics of patients (age at diagnosis, sex, and race); (b) features of carcinomas (primary location, tumor size, pathological grade, tumor node metastasis (TNM) stage, and histological subtype); (c) therapy details (record of surgery, radiotherapy, and chemotherapy); (d) follow-up records (cause of death, cancer-specific death, and number of months of survival). The TNM stages were classified according to the American Joint Committee on Cancer (AJCC) eighth edition criteria. 7

| Molecular characteristics
The genomic data for the lung adenocarcinoma patients were collected from TCGA and FDZSH. Messenger RNA expression profiles and DNA methylation data (combining level 3 data from Illumina GA and HTSeq platforms) were downloaded from TCGA. DNA variant data were downloaded from TCGA. The mutation status of EGFR, KRAS, ALK, RET, NRAS, ROS-1, and HER2 was obtained from the pathological reports of patients in Zhongshan Hospital. The status of target genes was obtained from pathologists' reports and was detected with a fluorescence real-time polymerase chain reaction kit. 5

| Survival statistical analysis
Survival statistical analyses were performed in IBM SPSS statistics software, version 22.0 (IBM, Inc) and R version 3.5.1 (R Foundation for Statistical Computing). Kaplan-Meier and log-rank tests were used to construct and compare survival curves. To confirm whether the upregulated genes or the increased mutational load genes in the PMA cohort were associated with poor survival, we split the patients into a high expression group (>median expression level across all samples) and a low expression group (≤median expression level across all samples); an increased mutational load group (>median mutational level across all samples) and a decreased mutational load group (≤median mutational level across all samples).
Additionally, patient variables whose P values were <.1 in univariate analyses were used in multivariate analysis performed with the Cox proportional hazard regression model. 6 The results of multivariate analysis were used to structure a nomogram and concordance index (c-index), and calibration plots were used to evaluate model performance. In addition, nomograms were used to compare the predicted survival with the observed survival. 8 Simultaneously, patients from FDZSH were used as a | 519 CHEN Et al.
validation cohort, and the total points for each patient in the validation cohort were calculated according to the established nomogram.

| Genome statistical analysis
Genome statistical analyses performed in R version 3.5.1 were as follows: (a) differentially expressed genes (DEGs) among all samples between the PMA and other LUAD patients were analyzed by using the DESeq2 package in R. The statistical threshold for significance was a false discovery rate <0.05 and fold change >2; (b) differentially mutated gene (DMG) analysis was performed between the PMA and other LUAD cohorts, and the frequency of gene mutation in all samples was determined by using the maftools package in R. The difference in the mutation ratio for each gene was identified with Fisher's exact test, and a P value < .05 was considered significant; (c) differentially methylated genes (DMeGs) were considered to be genes whose absolute differences in average methylation level between PMA and other LUAD cases were ≥0.15, with P < .05 in a Wilcoxon test. In our study, C-phosphate-G sites in which methylation was present in at least 80% of the samples were used to evaluate the methylation levels of expressed genes 9 ; (d) GO and KEGG analyses were performed with the clusterProfiler package in R to identify the main functions of the DEGs. A significant difference in Gene Ontology (GO) or KEGG pathways was defined as P < .05.

| Patient characteristics
Overall, 77 834 patients, 1196 patients, and 514 patients with primary lung adenocarcinoma were identified from population-based SEER, FDZSH, and TCGA databases; the numbers of PMA patients selected were 3622 (4.65%), 41 (3.43%), and 15 (2.92%), respectively. The baseline clinicopathological characteristics of all participants included in our study are summarized in Table 1.
As shown in Table 1, the PMA and other LUAD patients had similar ages; the median ages of the PMA and other LUAD cohorts in the SEER, FDZSH, and TCGA samples were 67, 67, and 61; and 60, 62, and 65 years, respectively. Female patients accounted for the majority of PMA cases. In the SEER cohort, in contrast with the other LUAD cohort, the PMA cohort had fewer grade III/IV tumors and less pleural invasion; consistently, PMA patients tended to undergo surgery (54.0%) and were less likely to have had radiotherapy (23.7%) or chemotherapy (41.4%). However, owing to the limited sample size, the percentage of PMA patients who had accepted chemotherapy or radiotherapy was not similar in FDZSH and TCGA.
Additionally, the most common primary site of PMA was the lower lobe, and the proportion of lower lobe tumors in PMA in the SEER, FDZSH, and TCGA cohorts was 45.7%, 53.3%, and 43.9%, respectively. Moreover, PMA patients tended to have less advanced cases: in the SEER cohort, stage I/II tumors accounted for the largest proportion of PMA (47.1%) patients, yet stage III/IV carcinomas were dominant in other LUAD (73.7%) samples. The FDZSH database showed similar results; however, in the TCGA cohort, the proportion of stage III/IV in PMA (33.7%) was higher than that in other LUAD (22%).

| Genetic analysis
In DEG analyses, a total of 514 patients (15 PMA and 499 other LUAD) were included in the TCGA database. The DEGs between the PMA and other LUAD cohorts were selected with strict criteria of fold change >2 and a false discovery rate <0.05. Ultimately, 345 DEGs were identified between PMA and other LUAD, and 24 upregulated genes and 321 downregulated genes in the PMA group were identified. Additionally, a heat map ( Figure 1A) was constructed to show the top 50 DEGs between PMA and other LUAD.
A total of 509 patients in TCGA (16 PMA and 493 other LUAD) who had complete mutation data were included in DMG analyses. Thirty DMGs between PMA and other LUAD were selected; the heat map of all DMGs between PMA and other LUAD is shown in Figure 1B. Furthermore, 1196 patients (41 PMA and 1155 other LUAD patients) in FDZSH were included in our study; the mutated genes in PMA and other LUAD are shown in Table 2.
We selected 465 patients (18 PMA and 447 other LUAD) with complete methylation data in TCGA and identified 72 DMeGs between the PMA and other LUAD samples. The heat map of all DMeGs is shown in Figure 1C.
In GO and KEGG analyses, there were 59 enriched functional categories for DEGs. The top five significantly enriched GO and KEGG terms of DEGs are shown in Figure 2
As shown in Figure 6, patients who had received surgery (P < .001; Figure 6A) had significantly better OS than those who did not. However, patients who had not accepted chemotherapy (P < .001; Figure 6B) or radiotherapy (P < .001; Figure 6C) tended to have a better OS than those who underwent chemotherapy or radiotherapy. Univariate analyses revealed that T stage (P < .001), M stage (P < .001), and N stage (P < .001) all significantly affected OS.
All risk factors with P < .1 in the univariate analyses were included in multivariate analyses, except for stage and pleural invasion (because both were associated with T, N, and M stages). Multivariate analyses showed that age (P = .021), race (P = .003), T stage (P = .029), N stage (P = .001), surgery (P = .002), and chemotherapy (P < .001) were able to predict survival of PMA patients. The details of the correlations between survival and factors described above are shown in Table 3.
Furthermore, as shown in Figure 7, a nomogram of PMA in the SEER cohort was constructed to compare the predicted survival with the observed survival. The c-index of this nomogram was 0.724 (95% CI: 0.721-0.727). The calibration plot of the SEER database and validation cohort is shown in Figure 8. In the validation cohort, the median follow-up time was 44.2 months (interquartile range, 48-60 months), and the C-index of the nomogram for predicting OS was 0.882 (95% CI, 0.812-0.866), which indicated a good prediction model.

| DISCUSSION
A total of 3622, 41, and 15 patients with PMA were selected from the SEER, FDZSH, and TCGA databases, respectively, to investigate the correlations between clinicopathological features and genetic phenotypes with survival.
Because the SEER cohort had the largest sample size, it was chosen to explore the prognostic indicators of PMA. In the SEER database, in contrast with the other LUAD cohort, PMA patients had less chemotherapy, and more early-stage tumors and lower lobe tumors; PMA patients tended to accept surgery and had a better prognosis than other LUAD cases. Age, race, T stage, N stage, surgery, and chemotherapy were found to be independently associated with survival in the PMA cohort. A nomogram based on these significant factors was constructed to predict the OS rates of PMA patients.  were identified to explore the molecular characteristics of the PMA group. We analyzed the differences in DEGs, DMGs, and DMeGs between PMA and other LUAD samples, then performed GO and KEGG pathway analyses to further determine the functions of these different genes. To the best of our knowledge, our study is the first to comprehensively describe the clinicopathological characteristics as well as the molecular characteristics of PMA.
In our study, the proportion of PMA ranged from 2.92% to 4.65%, a result consistent with findings from previous studies indicating that PMA composes 2%-10% of lung adenocarcinoma patients. 3,4 Female patients accounted for the majority of PMA cases and ranged from 53.6% to 58.6% in other studies, 10,11 thus indicating that PMA is predominant in females. Our research also showed that PMA patients had less pleural invasion and were less likely to be smokers; the most common site for PMA was the lower lobe, a finding consistent with the results of previous studies. 12,13 PMA is often defined as a malignant tumor, and PMA patients may have poorer prognosis than other LUAD patients. 14 A previous study has reported that the reason for this finding may because the pressure induced by mucus or the fluid produced by mucinous adenocarcinoma is taken up by the lymphatic system, which may lead to mucinous adenocarcinoma cells becoming more apt to diffuse and transfer and caused a poorer prognosis. 15 Moreover, Dacic S has suggested that PMA patients tend to not be surgically treated, thus possibly also resulting in a poorer prognosis in PMA patients . 16 However, in our study, we found that the PMA patients had a better OS compared to other LUAD patients in the SEER cohort, and the OS  [17][18][19] The reasons for these findings may partly be because PMA patients had more early-stage tumors in the population-based SEER database. Age, race, T stage, N stage, surgery, and chemotherapy were found to be independent risk factors after multivariate analysis, as shown with a nomogram based on these risk factors. Intriguingly, we also noticed that chemotherapy is negatively associated with survival in univariate analysis but positively associated in multivariate analysis. This may be explained by the fact that application of chemotherapy involves interactions with other factors, including age and stage. Mainly because, in most situations, patients at advanced stage or in older age groups are more likely to undergo chemotherapy. In our nomogram, chemotherapy has a slight but a significant impact on the prognosis in multivariate analysis, and patients who had chemotherapy had a better OS. This result was consistent with findings from a study previously reported by Liu et al 19 Meanwhile, many studies have reported that EGFR mutations were very rare in PMA and PMA patients may have a very weak response to targeting drugs such as EGFR kinase inhibitors. 20,21 A study by Liu has reported that other chemotherapy drugs such as pemetrexed may be  used in PMA patients and may relieve their symptoms. 19 As reported in previous research, guidelines for PMA therapy have not been clearly established, and additional research is needed to resolve this important question. Additionally, many studies had demonstrated that pleural invasion was significantly associated with an increased risk of survival. 6,22,23 However, we observed that pleural invasion is a favorable prognostic factor in univariate analysis in our study; it indicates that pleural invasion has an interaction with another variable, such as AJCC stage and chemotherapy. Therefore, more research is needed to study the role of pleural invasion in survival analyses of PMA in the future. Moreover, many studies have reported that KARS mutations are common in PMA, but the EGFR mutation rate is relatively low. 24,25 In our study, we found that EGFR and KRAS were not DMGs, and they were also not associated with OS in the TCGA cohort; however, in the FDZSH database, both EGFR and KRAS were DMGs. The sample size was larger, and the methods used to detect the target genes were more sensitive in the FDZSH cohort, thus potentially leading to the opposite results. Nevertheless, our research showed that only Pcgf6 was both differentially mutated and closely linked to decreased OS. Pcgf6 encodes the Pcgf6 protein, which interacts with some PcG proteins and serves as a transcription repressor. 26 Previous studies have shown that Pcgf6 is a polycomb group protein that contains a ring finger motif and is a defining component of different PCR1-type complexes. 27,28 Zdzieblo et al 29 have reported that Pcgf6 participates in maintaining embryonic stem cell pluripotency by regulating core pluripotency, mesodermal, and spermatogenesis genes. Lee JH et al 30 have shown that driver mutations in Pcgf6 enhance cancer cell migration, prompt metastasis, and may act by activating epithelial-to-mesenchymal transition. Additional experiments should be performed to explore the therapeutic value of Pcgf6.
Interestingly, we also found that high expression of CIDEC, ARL14, FGF19, HNF4A, and TTF-1 were related to poorer prognosis. TTF-1, which belongs to the NKX2-1 family, is often expressed in the lung, and its expression in lung adenocarcinoma is considered a specific marker of lung adenocarcinoma. 31,32 As mentioned above, PMA was less likely to harbor EGFR mutations but had a higher frequency rate of KRAS mutations. Consistent with these findings, many studies have reported that TTF-1 has a strong relationship with EGFRmutant lung adenocarcinoma and that TTF-1 is an oncogene in lung adenocarcinomas with EGFR mutation. 33, 34 Shanzhi et al found that the levels of TTF-1 expression may guide clinical therapy for LUAD with EGFR mutations. 35  T A B L E 3 (Continued) F I G U R E 7 Nomogram to predict 1-, 3-, and 5-year overall survival of patients with pure mucinous adenocarcinoma has been found to depend on TTF-1 expression for growth and to possibly be regulated by survivin protein. 36 More studies are needed to elucidate the role of TTF-1 in the process of PMA.
There are some limitations to our study. First, the SEER database did not provide detailed information on aspects such as smoking and alcohol history, and details of chemotherapy or radiation therapy; therefore, we could not analyze those variables in our study. Second, the patients were mostly selected from the USA, and the results might be different in Asians and Caucasians. Third, the sample size and gene detection method may have caused a lack of association; for example, the numbers of patients in the FDZSH and TCGA cohorts were small, and this might have introduced some bias in the results. Finally, our study was a retrospective study, and large randomized controlled trial studies may be required to confirm the results. Nonetheless, to our knowledge, this is the first report to systematically investigate the relationship between the clinicopathological features and genetic phenotypes of PMA with survival; moreover, patients in our hospital were used to validate the nomogram model and the predictive values of selected genes.

| CONCLUSION
Patients with PMA have unique clinicopathological features compared with other LUAD patients, including more tumors in the lower lobes, early-stage cancer, and less pleural invasion. In addition, PMA patients had better prognosis than other LUAD patients in the population-based SEER cohort. Age, race, T stage, N stage, surgery, and chemotherapy were independently associated with OS, and a nomogram model was used to predict the 3-and 5-year OS of PMA patients. Furthermore, our research also assessed the differences in DEGs, DMGs, and DMeGs between PMA and other LUAD; for example, Pcgf6 and TTF-1. Besides, GO and KEGG analyses were used to explore the associated biological characteristics. Additional validation studies are needed to determine our study's true efficacy.