Construction and validation of prognostic nomograms for elderly patients with metastatic non‐small cell lung cancer

Abstract Background Metastatic non‐small cell lung cancer (NSCLC) is mostly seen in older patients and is associated with poor prognosis. There is no reliable method to predict the prognosis of elderly patients (≥60 years old) with metastatic NSCLC. The aim of our study was to develop and validate nomograms which accurately predict survival in this group of patients. Methods NSCLC patients diagnosed between 2010 and 2015 were all identified from the Surveillance, Epidemiology, and End Results (SEER) database. Nomograms were constructed by significant clinicopathological variables (p < 0.05) selected in multivariate Cox analysis regression. Results A total of 9584 patients met the inclusion criteria and were randomly allocated in the training (n = 6712) and validation (n = 2872) cohorts. In training cohort, independent prognostic factors included age, gender, race, grade, tumor site, pathology, T stage, N stage, radiotherapy, surgery, chemotherapy, and metastatic site (p < 0.05) for lung cancer‐specific survival (LCSS) and overall survival (OS) were identified by the Cox regression. Nomograms for predicting 1‐, 2‐, and 3‐years LCSS and OS were established and showed excellent predictive performance with a higher C‐index than that of the 7th TNM staging system (LCSS: training cohort: 0.712 vs. 0.534; p < 0.001; validation cohort: 0.707 vs. 0.528; p < 0.001; OS: training cohort: 0.713 vs. 0.531; p < 0.001; validation cohort: 0.710 vs. 0.528; p < 0.001). The calibration plots showed good consistency from the predicted to actual survival probabilities both in training cohort and validation cohort. Moreover, the decision curve analysis (DCA) achieved better net clinical benefit compared with TNM staging models. Conclusions We established and validated novel nomograms for predicting LCSS and OS in elderly patients with metastatic NSCLC with desirable discrimination and calibration ability. These nomograms could provide personalized risk assessment for these patients and assist in clinical decision.

analysis (DCA) achieved better net clinical benefit compared with TNM staging models. Conclusions: We established and validated novel nomograms for predicting LCSS and OS in elderly patients with metastatic NSCLC with desirable discrimination and calibration ability. These nomograms could provide personalized risk assessment for these patients and assist in clinical decision.
K E Y W O R D S elderly patients, metastasis, nomogram, non-small cell lung cancer (NSCLC), prognostic model, SEER database

| INTRODUCTION
Lung cancer is the most widespread type of cancer and the leading cause of cancer-related deaths worldwide. 1 Non-small cell lung cancer (NSCLC) accounts for about 85% of all lung cancer cases, mainly including squamous cell carcinoma and adenocarcinoma subtypes. More than 1 million deaths are reported annually. 2,3 Approximately two-thirds of NSCLC patients have local or distant metastases at the time of diagnosis, which is associated with poor prognosis. Only about 15% of these patients survive more than 5 years after diagnosis. Local metastatic sites are most commonly found in the lymph nodes (LNS) and contralateral lungs, and distant metastases often occur in the liver, brain and bone. 4 An increasing number of patients with advanced NSCLC are over the age of 70 years, 5,6 and the proportion is increasing. The two major known oncogenic drivers in NSCLC are epidermal growth factor receptor (EGFR) mutations and anaplastic lymphoma kinase (ALK) fusions. Nonetheless, there are fewer investigations concerning the distribution of genetic mutations over different ages. Ueno et al. first prospectively assessed the role of age in EGFR mutations in 1262 patients with lung cancer and demonstrated that only 30% of patients carrying EGFR mutations were under 45 years, compared with 70% over 65 years age. 7 It is interesting that ALK fusions were predominantly seen in in younger patients with NSCLC. 8,9 Investigating the mechanisms of age differences in the onset of different mutation types may help in the screening of the characteristic population. With the development of the aging population, the incidence and social burden of this disease will grow markedly, posing unique challenges to treatment plans. Furthermore, elderly cancer patients, including those of lung cancer, are significantly underrepresented in clinical trials and may not receive adequate treatment. 5 Previous clinical studies have demonstrated that vinorelbine monotherapy prolongs overall survival (OS) in elderly patients with advanced NSCLC, suggesting that systemic chemotherapy may be useful in this population. 10 Recently, treatment with carboplatin plus pemetrexed followed by maintenance treatment with pemetrexed in advanced non-squamous NSCLC patients aged ≥75 years showed no inferiority to docetaxel monotherapy. 11 Comprehensive geriatric assessment (CGA) is a term coined by geriatricians to describe a comprehensive assessment of functional status, co-morbid medical conditions, cognition, psychological status, social support, nutritional status, possible geriatric syndromes, and pharmacological therapy in older individuals. 12 The prognostic assessment based on CGA in elderly cancer patients focuses on the impact of the patient's general status and health care on patient survival. Study by Corre et al. divided elderly patients with advanced NSCLC into three groups based on the treatment of CGA, but the grouping failed to improve the survival of the patients. 13 As such, there is still a lack of effective method to predict the survival of elderly metastatic lung cancer.
Due to limited research on the behavioral patterns of elderly patients with metastatic NSCLC and few relevant survival analyses, there is an urgent need to develop a simpler and more sensitive assessment model to individualize the prediction of this population. As a prognostic method, the nomogram contains important clinical and pathological risk factors, and can visualize the results by quantifying the impact of these variables on individual survival prediction. 14 This method has been applied to predict the prognosis of breast cancer, bladder cancer and other cancers. [15][16][17][18][19] To our knowledge, nomograms are not currently used to analyze the survival outcomes of elderly patients with metastatic NSCLC. Therefore, the aim of our research was to establish comprehensive nomograms to assess the prognosis of NSCLC by extracting relevant information from the Surveillance, Epidemiology and End Results (SEER) database and performed individualized survival prediction so as to provide accurate basis for clinical decision making.

| Study cohort
The data analyzed in the study were obtained from the SEER database, which covered almost 30% of the entire U.S. population. SEER*Stat 8.3.5 software was performed (http://seer.cancer.gov/SEERSTAT/) to access the database. Because metastatic site codes were available from 2010 in the SEER database, patients diagnosed with NSCLC between 2010 and 2015 were enrolled in this research only. The inclusion criteria were as follows: (1)

| Construction and validation of nomograms
The eligible patients were randomly distributed to the training cohort (n = 6712) and the validation cohort (n = 2872) in a 7:3 ratio by applying the 'createDataPartition' function in the 'caret' package in R. In the training cohort, univariate prognostic factors with p < 0.05 were further incorporated into multivariate analyses. Next, prognostic factors with p < 0.05 in multivariate Cox regression analysis were applied to construct nomograms to predict survival outcomes (LCSS and OS).
Training set (bootstrapping method used 1000 resamples) and validation set were applied to evaluate the predictive performance of the models. The discriminability of the model was assessed by calculating the Harrell's concordance index (C-index) with a 95% confidence interval (CI). Calibration curves was applied to compare the predicted probabilities between actual survival and the nomograms. Eventually, a decision curve analysis (DCA) was performed to evaluate the net benefit and potential clinical utility based on threshold probability. The threshold probability was used to obtain the  net benefit (defined as the proportion of true positives minus the proportion of false positives, weighted by the relative harm of false-negative and false-positive results).

| Comparison of nomograms
The ability of the model based on the 7th TNM staging and the nomograms established in our research was compared in the training and validation cohorts with the use of C-index and DCAs.

| Statistical analyses
Differences between groups were assessed by chi-square test. Kaplan-Meier method was used for survival analysis, and differences between curves were tested by logrank test.

| Survival outcomes with different metastasis sites
Among the total population, the median survival time was 5 (IQR, 2-11) months. First, we conducted survival analysis on patients of different ages, and discovered that patients with poor prognosis were mainly concentrated in patients with a diagnosis age ≥80 years (Figure 2A,B

| Calibration and validation of the nomograms
Nomograms were developed based on independent prognostic factors identified by multivariate Cox regression analysis to predict 1-, 2-and 3-year LCSS and OS (Figure 3). The results indicated that the two factors,    Figure S1D-F). All had promising predictive value. Moreover, calibration curves showed excellent concordance between actual results and survival rates predicted by the nomograms.

| Comparison between nomograms
In the training cohort, the C-index values for LCSS and OS of the TNM-staging system were 0.534 (95% CI: 0.524-0.544) and 0.531 (95% CI: 0.523-0.539), respectively, which were considerably lower than the nomograms integrating all independent prognostic variables.
Meanwhile, the C-index of this research in the validation cohort was also remarkably higher than that of the TNM-staging system, with 0.528 (95% CI: 0.514-0.562) both in LCSS and OS (Table 4). In addition, compared with the TNM staging model, the DCA curves showed excellent net benefit of the novel nomograms in predicting 1-

| DISCUSSION
We extracted clinical and survival information of 9584 elderly patients with metastatic NSCLC from the SEER database. Twelve risk factors for predicting 1-, 2-and 3-year LCSS and OS were identified by univariate and multivariable Cox regression models and were used to establish prognostic nomograms. In this research, we firstly used independent demographic and clinicopathologic prognostic factors developed more comprehensive prognostic models for better predicting prognosis of elderly patients with metastatic NSCLC and help clinicians determine individualized treatment strategies. The population of aging adults in Canada is reported to more than double between 2005 and 2036. The  What is surprising is that nomogram shows great utility in predicting the probability of clinical events using individual variables, and has become a common prognostic tool in oncology.
In this study, the nomograms incorporated 12 variables: age, gender, race, tumor site, grade, pathology, T stage, N stage, surgery, radiotherapy, chemotherapy, and metastatic site. Meanwhile, chemotherapy as well as distant metastatic sites were the two strongest prognostic predictors. In this study, patients were more inclined to MOM. In other words, older patients were more likely to develop MOM once they experienced distant metastases. This may be for the reason that elderly patients have a tumor microenvironment that favors fibroblast-mediated angiogenesis and stromal remodelling. 21,22 In addition, the structure and function of the human immune system change with age. Patients of advanced age are prone to immune senescence, which allows tumors to evade immune system surveillance. 23,24 Owonikoko et al. investigated NSCLC patients ≥70 years based SEER database and discovered that the patients were predominantly white male, 3 which was similar to our research. Additionally, the study also made the observation that patients with stage T4 and grade III had a larger proportion of the corresponding variables. This was because the patients included were in advanced tumor stage, and therefore tended to have larger tumor volume along with worse grade. This was the same as the result of Liang's study. 25 The previous studies reported that patients with NSCLC diagnosed at the age over 80 years contributed to worse LCSS and OS, 26,27 which were consist with our    34 Our findings also supported this conclusion. One of the hypotheses regarding the better survival outcome exhibited by female probably was associated with different levels of hormone and receptor expression. [35][36][37] Moreover, several studies reported that lower grade tissue differentiation, lymph node metastasis together with larger tumor size were significantly associated with increased mortality in NSCLC. The same results were well supported by our statistical analysis.
In the end, we verified the performance of the models. The results demonstrated that the C-index as well as calibration curve of the prediction models performed well in both the training and validation cohorts, indicating that the nomograms had good predictive accuracy and reliability. Additionally, the DCA curves demonstrated that the novel nomograms had higher net benefit and clinical application than TNM staging system.
Altogether, we firstly developed visual prognostic assessment models for elderly patients with metastatic NSCLC. The use of nomogram scores to quantify the survival risk of a patient with organ-specific metastases to guide clinical treatment and prognostic assessment is a novel concept.
Despite above merits, there were still some limitations in this research. Firstly, some factors affecting prognosis were not included in the SEER database, such as smoking history, family history of cancer, gene mutations and physical state (PS) assessment. Secondly, as essential treatment approaches for NSCLC, the absence of targeted therapy and immunotherapy information from the SEER database was a major restriction of the current study. Moreover, patients with incomplete survival data or clinical details were not included in our research, which might lead to selection bias. Finally, although both internal and external validation sets are proposed to validate the nomogram, in the current study only internal validation was specified. Additional validation studies in independent populations are needed to verify the F I G U R E 5 Decision curve analysis in the training cohort of the nomograms and 7th edition AJCC-TNM staging system for predicting 1-, 2-, and 3-year LCSS (A-C) and OS (D-F). LCSS, lung cancer-specific survival; OS, overall survival generalizability of these results before clinical application. Nonetheless, this database provided valuable data for analyzing patterns of elderly patients with metastatic NSCLC across the United States.

| CONCLUSION
To our best knowledge, this was the first large-scale population-based research with nomograms to explore the prognosis of elderly patients with metastatic NSCLC.
All patients were followed up in detail. The novel models had excellent predictive performance and can intuitively predict patient survival. Meanwhile, the nomograms could be used as effective tools to assist clinicians in guiding individualized treatment decisions and consequently reduced the medical burden to some extent.