The nomogram for the prediction of overall survival after surgery in patients in early‐stage NSCLC based on SEER database and external validation cohort

Abstract Background & Aims Currently, there is a lack of effective tools for predicting the prognostic outcome of early‐stage lung cancer after surgery. We aim to create a nomogram model to help clinicians assess the risk of postoperative recurrence or metastasis. Materials and Methods This work obtained 16,459 NSCLC patients based on SEER database from 2010 to 2015. In addition, we also enrolled 385 NSCLC patients (2017/01‐2019/06) into external validation cohort at Tianjin Medical University General Hospital. Univariable as well as multivariable Cox regression was carried out for identifying factors independently predicting OS. In addition, we built a nomogram by incorporating the above prognostic factors for the prediction of OS. Results Tumor size was positively correlated with the risk of poor differentiation. Advanced age, male and adenocarcinoma patients were factors independently predicting poor prognosis. The risk of white race is higher, followed by Black race, Asians and Indians, which is consistent with previous study. Chemotherapy is negatively related to prognostic outcome in patients of Stage IA NSCLC and positively related to that in those of Stage IB NSCLC. Lymph node dissection can reduce the postoperative mortality of patients. AUCs of the nomograms for 1, 2, and 3‐year OS was 0.705, 0.712, and 0.714 for training cohort, while those were 0.684, 0.688, and 0.688 for validation cohort. Conclusions The nomogram could be used as a tool to predict the postoperative prognosis of patients with Stage I non‐small cell lung cancer.


| INTRODUCTION
Lung cancer not only has a very high incidence, but also the first mortality cancer in the word. 1 Currently, more than 85% of these cases are classified as non-small cell lung cancer (NSCLC). 2 and its 5-year survival rate is expected to be 26% percent that has improved only slightly over the past few decades. 3s computed tomography (CT) popularizes, more and more early-stage NSCLC diagnosed.5][6][7][8] Most recurrences or metastasis occurred within the first 2 years after surgery. 9Once recurrence or metastasis occurs, the prognosis of patients is very poor.][6][7] Therefore, it is necessary to create a tool supporting clinical decision-making, according to the individual factors for lung cancer patients.The tumor lymphatic metastasis (TNM) classified system released by the American Joint Committee on Cancer (AJCC) and the International Union against Cancer (UICC) has been extensively applied in predicting tumor stage and prognostic outcome in tumor patients. 10But TNM classification system can not sufficiently estimate clinical prognosis because numerous clinicopathological factors influence a patient's prognosis.6][17][18] There have been many previous studies applying nomogram to predict the risk of lung cancer, but few of them have focused on Stage I postoperative NSCLC.0][21][22][23] We aim to create a nomogram model for postoperative survival using the data form SEER database during 2010-2015.In addition, we also enrolled 385 NSCLC patients into external validation cohort for validating model effectiveness.The nomogram is assessed with receiver operating characteristic (ROC) curve, calibration curve, together with decision curve analysis (DCA) and concordance index (C-index).

| Development of the nomogram and statistical analysis
The categorical variables were represented by numbers (n) and percentages (%), and compared by the chi-square test.Through a no-replacement random sampling method, all samples were classified as training or validation cohort at the 6.5:3.5 ratio.Training cohort was used to create nomogram, while validation cohort was applied in validating results of training cohort.Factors associated with OS were obtained by univariable Cox regression, which were later incorporated into multivariable regression to identify independent risk factors.Variables with p < 0.05 upon univariable Cox regression were incorporated for multivariable Cox regression.We also determined hazard ratios (HR) as well as 95% confidence intervals (CI) for the filter variables.According to multivariable regression, we then created prognosis prediction nomograms for 1-, 2-, and 3-year OS.AUC is defined in the context of ROC curve analysis as the area enclosed by coordinate axes and ROC curve.Nomogram diagnostic significance is represented by AUC, which ranged from 0.5 to 1.The bootstrap method was used to conduct calibration curve analysis using 1000 resamples.Horizontal coordinate represents threshold probability, while vertical coordinate indicates net benefit rate following subtraction of benefits from the harm.If different evaluation approaches reached up to the certain value, the temporal event risk probability was represented by Pi.If Pi reached the particular threshold (Pt), it was positive.The model curve close to two reference lines indicates that this model does not have any value, whereas a curve on the top of reference line within a broad threshold interval indicates that the model is more valuable.
Statistical analysis was conducted with R Statistical Software (version 4.1.3)and IBM SPSS Statistics Software, version 22.0.Clinicopathological features were included to construct the nomogram using "rms", "foreign", "survival" and "regplot" packages.The "caret" package was used for randomization and the "survival" package was used for C-index.Relations between different variables and OS were assessed by Kaplan-Meier survival curve analysis.

| Patient features
cohort).Median follow-up time for all 385 surgically resected Stage I NSCLC patients was 45.1 months (mean 43.5 months).In the SEER cohort, the 80% (13,163 patients) were over 60 years old, 56% (9210 patients) were male patients, and 83% (13654) were White, while the rest were Black or other.All variables except gender were not significantly different between training and internal validation groups, possibly as a result of differences in treatment methods according to age and race at randomization.Tables 1 and 2 display more details of patient clinicopathological features.
The patients of SEER database cohort were excluded if they had been followed for less than 2 years and had not died by the time of their next visit.As to the TJMUGH cohort, the inclusion criteria were as follows: (1) Patients receiving preoperative neoadjuvant therapy; (2) Patients with other tumors; (3) Patients who refused follow-up.Overall survival (OS) was our endpoint, while follow-up period was the duration between surgery date and final follow-up or cancer-specific death date.

| Establishment of prognosis prediction nomogram
Upon univariable and multivariable Cox regression on SEER cohort of NSCLC patients, Age, Gender, Marital status, Race, Histologic Type, Stage, Tumor size, Resection Type, Resected lymph node station number (No. of resected LN stations), Resected lymph node number (No. of resected LNs), Radiation, Chemotherapy and Grade showed close relation to OS of patients (p < 0.05, Figure 2A,B).Chemotherapy was related to a good prognostic outcome of Stage IB NSCLC patients, whereas it was related to the dismal prognostic outcome of Stage IA NSCLC patients (p < 0.05, Figures 3 and 4).Based on the multivariate Cox regression result and clinical experience, we developed the clinical prognostic nomogram for predicting patients' 1-, 2-, and 3-year OS probabilities (Figure 5A).There is a clear relationship between the variable location in the nomogram and the contribution to OS.According to variable contribution to survival probability, all variables had their own points (range, 0-100).Using functional transformation relation between the point sum of the various variables and the probability of survival, the total points could be obtained, and separate survival outcomes were predicted by relation of total points with survival probability.We created a web page where a patient's survival risk can be calculated by simply selecting the corresponding clinical factors.The web page link is as follows: https:// tongz 0225.shiny apps.io/ Lung_ nomog ram/ .For instance, one 50-year-old (55 points) white (70 points) male (75 points) patient without chemotherapy (55 point) NSCLC after the sum-point equaled to 680, which indicated the 1-, 2-, and 3-year OS possibilities of 94.6%, 89.5%, and 84.1% respectively.

| Evaluation and validation of nomogram
As part of the validation process, the model accuracy and discrimination performance were evaluated by calibration curve, and area under the receiver operating characteristic curve  In training cohort, AUCs of the nomograms were 0.705, 0.712, and 0.714 for 1, 2, and 3-year OS separately (Figure 6A).Similarly, those were 0.684, 0.688, and 0.688, separately in test cohort (Figure 6B).Additionally, we utilize external datasets for the validation of our results.As well, we found that data from external datasets showed better results when compared to data from seer datasets (AUCs were 0.974, 0.815 and 0.806 for 1-, 2-, and 3-year OS separately, Figure 6C).Similarly, DCA demonstrated that the nomograms provided superior net benefit to SEER stage, validating that our nomograms were superior to SEER stage (Figure 6D,E).Finally, the C-index revealed good prognostic accuracy of nomogram with values of 0.684 (95% CI 0.676-0.693) in the training cohort, 0.660 (0.647-0.671) in the test cohort and 0.778 (95% CI 0.708-0.849) in the external validation cohort.

| Survival Analyses
The total score of patients was calculated according to the nomogram.We designated patients with nomogram scores of less than the median nomogram score (684 points) as the low-risk group, and patients with nomogram scores higher than median were identified into a high-risk group.Both the training and validation sets showed significant differences between the survival curves for both risk groups (p < 0.001), conforming to observations of external validation set.High-risk group showed markedly poorer OS relative to low-risk group for training and internal validation cohorts (p < 0.001, Figure 7).Besides, low-risk group showed superior OS of external cohort (p < 0.001).

| DISCUSSION
We conducted multivariate data suggesting that, as expected, tumor size was positively correlated with the risk of poor differentiation.The mortality rate of Stage I patients is higher in older age, male patients. 24,25The risk of white race is higher, followed by Black race, Asians and Indians, which is consistent with previous studies. 25,26Chemotherapy is negatively related to prognostic outcome of Stage IA NSCLC patients, whereas positively correlated with that of Stage IB NSCLC patients, conforming to prior reports. 27Postoperative radiotherapy can't reduce the mortality in I stage NSCLC patients, but increase patient mortality, which is consistent with previous reports. 28 Lymph node dissection can reduce the postoperative mortality of patients, which may be related to the removal of some potential micrometastatic lesions with more lymph node dissection, and more lymph node dissection can make N staging more accurate. 30It is controversial about the lymph node dissected number and extent during the operation.In a previous study, it is reported that in patients with local resection of lung cancer, N1 + N2 resection has a better prognosis than N1 resection. 31Our results showed that patients with resected more than 4 lymph stations had a better prognosis.But it is also reported that the number of lymph node station dissection greater than 3 may be a risk factor for postoperative lung cancer. 32It has been suggested that removing less than 10 lymph nodes may increase the risk. 25Moreover, some studies have found that the optimal range for removal is between 10-11 and 8-14 nodes. 30,33Similarly, other studies have reported that the more lymph nodes removed, the better, 34 which is consistent with our research findings.
Finally, there is a debate about whether the surgical method of early lung cancer is local resection or lobectomy.Studies have shown that there is no difference in morbidity and mortality between the two groups during the perioperative period. 35Some previous reports have reported that patients undergoing lobectomy had an increased OS rate compared with those receiving sublobectomy, 36,37 and other studies have shown that the prognosis of sublobectomy is consistent with that of lobectomy when patients can choose both surgical methods but not for other reasons. 14A phase 3 trial shows that sublobar resection is noninferior to lobar resection of IA NSCLC patients when the tumor size is <2 cm. 38Therefore, we believe that OS rate of lobectomy may increase relative to sublobectomy, but there may be no difference in the prognosis of those patients who can accept both procedures.
In this nomogram, we included age, sex, race, marital status, pathology, tumor size, resected lymph node number, number of lymph node stations resected, radiotherapy, chemotherapy and grade.These factors are associated with prognostic outcome of lung cancer.Through the SEER database, we use a lot of data for modeling, and the risk will be scored accordingly, demonstrating that prognosis were significantly different in low-risk compared with high-risk groups.
Certain limitations should be noted in this work.First, because it is a retrospective study, the data may be biased and inaccurate.Second, due to the limitation of SEER database, the population of different races is biased, and the population of SEER data can't represent all populations.Third, such as the use of postoperative targeted drugs or immune drugs, chemotherapy regimens and doses, whether patients receive neoadjuvant therapy.Finally, the number of samples in our external cohort is small, and only Asians are included and radiotherapy information is not included.
We screened some risk factors associated with postoperative prognosis of lung cancer.Using nomogram, we established a high-performance tool to predict postoperative prognostic outcome in Stage I patients.In addition, we stratified the risk score of the postoperative population, providing an easy-to-use postoperative management prediction tool for doctors and patients to promote individualized postoperative care.

F I G U R E 2
Independent predictors screening.Univariable (A) and multivariable (B) Cox regression of clinicopathological characteristics associated with OS in the SEER Database.F I G U R E 3 THE Kaplan-Meier survival estimates by chemotherapy, in IA.F I G U R E 4 The Kaplan-Meier survival estimates by chemotherapy, in IB.

(
AUC) analysis for training and validation cohorts.For evaluation of model calibration, we plotted calibration curves.As a result, the predicted survival was well consistent with real survival of training cohort (Figure5B-D).As shown in the calibration curves, the gray line indicates the reference line, and the predicted survival is consistent with real survival.

F I G U R E 5
Nomogram construction and validation.(A) A nomogram plot was constructed on the basis of chemotherapy, number of resected lymph node stations, resection type, radiation, marital status, number of resected lymph nodes, histologic type, race, tumor size, grade, gender and age.According to calibration curves, the nomogram-predicted 1-(B), 2-(C), and 3-year (D) OS was well consistent with real OS.

, 29 F
I G U R E 6 ROC curves of 1-, 2-, and 3-year OS for patients and DCA curves based on nomogram.(A, D) training, (B, E) test, and (C, F) external validation cohorts.
Basic demographics and clinical features of NSCLC patients of TJMUGH cohort.
T A B L E 2