Ratio between negative and positive lymph nodes is a novel prognostic indicator for patients with esophageal cancer: A Surveillance, Epidemiology and End Results database analysis

Background The aim of this study was to explore whether the ratio between negative and positive lymph nodes (RNP) could predict the overall survival (OS) of esophageal cancer (EC) patients with lymph node metastasis following esophagectomy. Methods We utilized the Surveillance, Epidemiology and End Results (SEER) database to include the records of 2374 patients with lymph node metastases post‐surgery. All patients were randomly assigned into the training cohort (n = 1424) and validation cohort (n = 950). Multivariate Cox regression analyses were performed to identify independent prognostic factors. A novel RNP ‐based TRNPM staging system was proposed. The prognostic value of N, RNP, TNM and TRNPM staging system was evaluated using the linear trend χ2 test, likelihood ratio χ2 test, and Akaike information criterion (AIC) to determine the potential superiorities. We constructed nomograms to predict survival in both cohorts, and the calibration curves confirmed the predictive ability. Results Univariate analyses showed that N and RNP stage significantly influenced the OS of patients. Multivariate analyses revealed that RNP was an independent prognostic predictor in both the training and validation cohorts. For the stratification analysis in the two cohorts, we found significant differences in the prognosis of patients in different RNP groups on the basis of the different N stages and the number of dissected lymph nodes. In addition, the lower AIC value of RNP stage and TRNPM staging system represented superior predictive accuracy for OS than the N stage and TNM staging system, respectively. Furthermore, the calibration curves for the probability of three‐ and five‐year survival showed good consistency between nomogram predictive abilities and actual observation. Conclusions We demonstrated that compared to the classical pathological lymph nodal staging system, the RNP stage showed superior predictive accuracy for OS and can serve as a more effective prognostic guidance for lymph node positive EC patients.


Introduction
Esophageal cancer (EC) is a highly invasive digestive system malignancy characterized by rapid growth and early metastasis. 1 Esophagectomy with lymphadenectomy has been applied as the standard treatment modality for potentially resectable EC. However, despite significant progress in multimodal treatment in recent years, the prognosis for patients with EC remains poor. 2,3 The identification of prognostic factors for EC is extremely important in predicting prognosis and guiding treatment. The postoperative pathological lymph node (N) staging is a basic staging of lymph node metastasis in line with the eighth edition of the American Joint Commission on Cancer (AJCC) criteria. 4 However, the number of metastatic lymph nodes depends on the number of dissected lymph nodes. A low number of examined lymph nodes may lead to stage migration. 5 To improve the existing prognostic evaluation system, we aimed to identify the optimal prognostic indicators for EC patients. The number of negative lymph nodes (NLNs) is the difference between the total number of completely removed lymph nodes (RLNs) and the number of positive lymph nodes (PLNs). Previous studies have shown that the NLN count is a valuable predictor of prognosis in various cancers. [6][7][8][9][10] Several studies have also demonstrated that the number of NLNs is positively correlated with the OS of EC patients. 11 The higher the number of NLNs a patient has, the better the prognosis. 12,13 The ratio of NLNs to PLNs (R NP ) is obtained by taking the ratio of the number of NLNs to the number of PLNs. Several studies have validated that the R NP is a novel prognostic predictor in colon cancer and gastric cancer patients post-surgery. [14][15][16] However, the prognostic performance of R NP in EC patients is currently unknown.
Therefore, the purpose of this study was to elucidate the value of R NP in predicting the long-term survival of EC patients using a population-based analysis of the Surveillance, Epidemiology and End Results (SEER) database.

Study population and data source
Utilizing the SEER database, we performed a retrospective study and analyzed the medical records of 5977 EC patients. Clinical data such as patient demographics, lymph node staging and survival data were collected for subsequent analyses.
The EC incidence data were collected from the SEER database, which is originally sourced from publicly available datasets incorporating data from approximately 29% of the US population. The OS of EC patients postesophagectomy in the SEER database were estimated. We compared the OS using univariate Kaplan-Meier survival analyses and multivariate Cox regression analyses.
The inclusion criteria were as follows: (i) no distant metastasis; (ii) no preoperative neoadjuvant therapy, including chemotherapy, radiotherapy or chemoradiation; (iii) negative incision margins; (iv) confirmed by postoperative histopathological examination; (v) no perioperative death; and (vi) death due to EC progression or cancerrelated complications. The exclusion criteria included: (i) presence of other pathological types except for squamous cell carcinoma and adenocarcinoma; (ii) no positive lymph node metastases; (iii) relevant clinicopathological information was incomplete; and (iv) incomplete follow-up data.
After screening, a total of 2374 EC patients who met the specified criteria were assigned into the training and validation cohorts by random assignment, with a study endpoint of OS.
The R NP is defined as the ratio of the number of NLNs to the number of PLNs. We performed the following analysis to identify the appropriate cutoff point for the R NP value to maximize the significant survival differences between the various subgroups. We ranked the R NP values and divided the patients into 10 groups in a 10% proportion, compared the five-year survival rate, and used log-rank test to combine the neighborhood OS curves to determine the intervals of R NP classification. To match the N stages with the TNM staging system, the patients were divided into three subgroups: R NP 1 (R NP ≥ 6.3), R NP 2 (2.2 ≤ R NP < 6.3), R NP 3 (0 ≤ R NP < 2.2). Furthermore, to ensure comparability with the TNM staging system, we utilized a novel Tumor-R NP -Metastasis (TR NP M) staging system based on the R NP classification. The TR NP M staging system was set-up by replacing the N stage of the traditional TNM staging system with the matched R NP subgroups. The TR NP M staging system is as follows: IIB, T1R NP 1M0; IIIA, T1R NP 2M0, T2R NP 1M0; IIIB, T2R NP 2M0, T3R NP 1M0, T3R NP 2M0, T4aR NP 1M0; IVA, T1R NP 3M0, T2R NP 3M0, T3R NP 3M0, T4aR NP 2M0, T4aR NP 3M0.

Statistical analysis
We used the Statistical Package of Social Science 26.0 software (IBM Corp., Armonk, NY, USA) for statistical analyses. Kaplan-Meier curves were used for overall survival analyses, and log-rank tests were utilized for comparison. As with the multivariate survival analyses, significant prognostic predictors for OS from the univariate analyses were used for Cox regression analyses and the factors that remained statistically significant were identified to be independent factors in the final models of the effect on prognosis. All the curves are depicted using GraphPad Prism 8 (GraphPad Software, LLC). A two-sided P-value of <0.05 was considered statistically significant.
The nomogram was formulated to provide visualized risk prediction using R project version 4.0.2 (http://www.rproject.org/) with the survival and rms package. The calibration curves were finally derived through regression analysis. The performance of the resulting nomogram was internally and externally validated by calculating the concordance index (C-index).
The Akaike information criterion (AIC) value within a Cox proportional hazard regression model was calculated to compare performances among different lymph node staging systems because of its discriminatory ability. The lower the AIC value, the better the model for predicting outcome. By contrast, a higher linear trend χ 2 score or likelihood ratio χ 2 score verified a better model for predicting outcome.

Demographics of patients
The detailed clinicopathological characteristics in the training and validation cohorts are shown in Table 1. A total of 60% of the participants (n = 1424) were randomly assigned to a training cohort, whereas the remaining 40% were included in a validation cohort (n = 950). For the whole study population, there were 2055 males (86.6%) and 319 females (13.4%). The median age was 64 years old, with a range of 23-92 years old. In the training cohort, there were 776 patients in stage N1, 429 patients in stage N2, and 219 patients in stage N3. For the validation cohort, there were 568 patients in stage N1, 256 patients in stage N2, and 126 patients in stage N3. Based on the number of negative lymph nodes, patients were split up into three groups: NLN1 (n = 505), NLN2 (n = 361), and NLN3 (n = 558) in the training cohort. For the validation cohort, 326, 251 and 373 patients were split up into NLN1, NLN2 and NLN3 groups, respectively. Furthermore, based on the R NP value, there were 554 patients in stage R NP 1, 411 patients in stage R NP 2, and 459 patients in stage R NP 3 in the training cohort, while the number of patients in the validation cohort classified into R NP 1, R NP 2, R NP 3 were 398, 301 and 251, respectively.

Multivariate survival analysis
The effect of the prognostic variables on survival are described in Table 2. We compared three different lymph node stages, and measured their relationship with EC patient survival. The N stage, contained in Model 1, were statistically significantly related to OS in both the training and validation cohorts. The Model 2 incorporated N stage and NLN stage. While replacing the NLN stage, N and R NP satge were included in the Model 3 to see the difference. We then combined these variables to build the fourth model. In Model 2, N stage and NLN count were statistically significantly related to OS in both cohorts. In Model 3 of the training and validation cohorts for OS, R NP (hazard ratio (HR) = 1.276, 95% confidence interval (CI): 1.137-1.432, P < 0.001 and HR = 1.325, 95% CI: 1.146-1.532, P < 0.001,respectively) were identified as significant predictors, while N stage (P = 0.054 and P = 0.058, respectively) was not identified as a significant predictor in the two cohorts. Furthermore, in Model 4, we found that the R NP was correlated with survival (HR = 1.231, 95% CI: 1.072-1.414, P = 0.003 and HR = 1.222, 95% CI: 1.029-1.452, P = 0.022), but NLN no longer predicted OS (P = 0.360 and P = 0.091). Other significant prognostic predictors of OS remained as independent factors and included age, histological grade, T stage and N stage ( Table 2).

Prognostic prediction accuracy of the various categories of lymph node metastasis
To verify the prognostic performance of the R NP stage on the OS of patients, we performed stratification analyses of the prognostic effect of the R NP classifications on the basis of the different N stages and the number of dissected lymph nodes.
In N1 patients of both cohorts, R NP staging was identified as a significant predictor (both P < 0.001 , Fig 2a and  c). In the subgroup which incorporated both N2 and N3 patients, R NP staging was significantly correlated with OS  in both training (P < 0.001, Fig 2b) and validation cohorts (P < 0.001, Fig 2d). We also investigated the prognostic value of the R NP stage on OS in the context of the number of dissected lymph nodes. Figure 3 shows that effect of R NP classifications significantly differed across any number of dissected lymph nodes group in both the training and validation cohorts (both P < 0.001) (Fig 3a-d).

Comparison of the prognostic value between TNM and TR NP M classifications
Furthermore, the factor of R NP was incorporated into TNM staging system for EC patients. The two staging systems were directly compared for convenience. With the TNM staging system in the training cohort, 127 cases were stage IIB, 163 cases were stage IIIA, 883 cases were stage IIIB, and 251 cases were stage IVA. Furthermore, 91 cases were stage IIB, 123 cases were stage IIIA, 594 cases were stage IIIB, and 142 cases were stage IVA in the validation cohort. The fiveyear OS rates of stage IIB, IIIA, IIIB and IVA EC patients were 43.0%, 33.6%, 18.6% and 8.2%, respectively in the training cohort, while they were 45.6%, 29.9%, 15.3%, 8.6%, respectively in the validation cohort (Table 3).
With the TR NP M staging system, there were 81 stage IIB patients, 164 stage IIIA patients, 687 stage IIIB patients, and 492 stage IVA patients in the training cohort. Furthermore, 67 cases were stage IIB, 109 cases were stage IIIA, 505 cases were stage IIIB, and 269 cases were stage IVA in the validation cohort. The five-year OS rates of stage IIB, IIIA, IIIB and IVA patients were 48.5%, 38.2%, 21.0% and 10.4%, respectively, while they were 52.1%, 34.8%, 17.9% and 8.4%, respectively in the validation cohort. Therefore, the TR NP M staging system had a greater statistical significance comparable to the TNM staging system in both independent cohorts (P < 0.001, respectively) (Fig 4a-d). Comparison of the prognostic superiority between N, R NP , TNM and TR NP M classifications We used three parameters to compare the N and R NP classification; linear trend χ 2 score, likelihood ratio χ 2 score and AIC value. The higher linear trend χ 2 score and higher likelihood ratio χ 2 score, the better the system, whereas the lower the AIC value, the better the system. In the multivariable regression analyses, they were all independent factors of overall survival (both P < 0.001). We found that the linear trend χ 2 scores were 55.24 and 95.42 of N and R NP , respectively in the training cohort, while they were 30.51 and 69.50 in the validation cohort. While the likelihood ratio χ 2 scores were 72.88   (Table 4). Therefore, we considered that R NP had the better discrimination ability for obvious improvement in the accuracy of prognostic prediction for EC patients than the N classification.
The linear trend χ 2 scores, likelihood ratio χ 2 scores, and AIC values were also used to compare the prognostic performance of the two staging systems. We found that the TR NP M classification had the higher linear trend χ 2 scores, likelihood ratio χ 2 scores and lower AIC values compared to the TNM staging system in both cohorts (Table 4). We therefore demonstrated that the performance of the

3496
Thoracic Cancer 11 (2020) 3490-3500 TR NP M staging system is superior to the traditional TNM staging system in predicting the survival of EC patients after esophagectomy.

Prognostic nomograms for predicting the survival of EC patients
Furthermore, nomograms were used to calculate the threeand five-year OS of patients. R NP was selected as an independent prognostic predictor in nomograms in both training and validation cohorts, which were identical to those in the aforementioned multivariate analyses conducted by Cox regression. In the training group, the Cindex for predicting OS with the formulated nomogram was 0.648. The calibration curves exhibited optimal consistency between the actual observation of OS and nomogram-predicted OS at three-and five-years after surgery (Fig 5a and c). In the validation cohort, the C-index  for OS prediction was 0.674. The calibration plot in such group for OS prediction at three-and five-years also fitted very well between the observation and the prediction nomogram (Fig 5b and d).

Discussion
Our study analyzed the OS of two random cohorts of EC patients who underwent radical surgery and assessed the prognostic prediction performance of N, NLN and R NP . We confirmed R NP acted as significant prognostic factor in both the training and validation cohorts. Our nomogram also confirmed the prognostic significance of the R NP staging system in EC patients. Lymph node status is considered one of the key elements which influence the treatment decision of esophageal cancer patients. To the best of our knowledge, the N category and NLNs are identified by the number of PLNs. Thus, an inadequate number of dissected lymph nodes will influence lymph node count, further affecting treatment and prognosis. [17][18][19] To accurately assess lymph node metastasis for improving long-term outcomes, previous studies have included several different prognostic factors such as N, positive lymph node ratio, log odds of positive lymph nodes and NLN. 2,9,13,[19][20][21][22] However, controversies still exist over which lymph node metastasis factor system is optimal for accurately predicting patient prognosis following radical esophagectomy.
To help eliminate the effect of the number of lymph nodes dissected on N and NLN count, we propose R NP as a new prognostic indicator. In recent years, R NP has attracted attention in gastric cancer and colon cancer as a novel category of lymph node metastasis. In the two studies by Deng et al. 14,23 it was demonstrated that R NP could help improve the accuracy of prognostic evaluation when compared with other prognostic factors, and was recommended for use in predicting OS of GC patients. To date, little research has been devoted to elucidating the prognostic value of R NP in EC patients.
The univariate analysis demonstrated that the three lymph node categories, including N, NLN and R NP stages, were all significantly associated with survival. We further conducted multivariate analyses and established four models. After eliminating the influence of confounders, we found the R NP remained statistically significant among all the established models, whereas the N and NLN stage were not significant in Model 3 and Model 4, respectively. To further verify the prognostic performance of OS in EC patients, we performed a log-rank test on the three matched R NP subgroups on the basis of the different N stages and the number of dissected lymph nodes. Stratification analysis of the training cohort identified R NP as appropriate for distinguishing evaluation survival differences for all N subgroup patients. As for the validation cohort, R NP was identified as applicable for distinguishing evaluation survival differences between patients of N stage and patients with fewer or more than 16 dissected lymph nodes. Lower R NP stage was also associated with better survival regardless of the number of dissected lymph nodes in both cohorts. Therefore, we deduced R NP could serve as the optimal category for EC patients who underwent radical surgery.
We found that R NP had higher linear trend χ 2 score, higher likelihood ratio χ 2 score and smaller AIC value in Cox regression than the N stage, which implied that R NP had the better ability to exactly predict the prognosis of patients. The results of the validation cohort were consistent with the training cohort. Furthermore, our novel TR NP M staging system, which uses R NP instead of N staging, demonstrated better discrimination in EC patients compared to the TNM staging system according to the higher linear trend χ 2 score, higher likelihood ratio χ 2 score and smaller AIC value in both cohorts. Thus, the TR NP M staging system is more reliable for exact evaluation of the prognosis for patients than TNM staging system. Thus, we suggest that the R NP staging system can be used as a novel factor describing lymph node metastasis for predicting the prognosis of EC patients. In the current study, we constructed nomograms to predict survival in two cohort EC patients. The nomogram accurately predicted three-, and five-year overall survival in the training and validation cohorts; C-indexes confirmed the accuracy of these predictions and the calibration curves confirmed the predictive ability.
There are several limitations of our study that should be addressed. First, the survival dataset is incomplete and cannot be completed since this study was a retrospective analysis. To address this in the future, prospective studies are needed to confirm our results. Second, since the ratio of negative to positive lymph nodes was calculated, we excluded patients without positive lymph nodes. Therefore, only patients with specific lymph node metastases were involved in the analysis. Third, the SEER database was used for this study. Although this database is large with extensive long-term follow-up information, it lacks data correlated with survival, including adjuvant treatments, comorbidities and chemotherapy regimens and dosage. Also, whether using adjuvant therapy or not has an inevitable impact on surgical treatments for survival, especially for node-positive EC patients, remains to be determined. Therefore, the broader applicability of our results may be limited. In addition, the lack of detailed treatment information may have biased the results of the study.
In conclusion, the results of this study confirmed that R NP is more accurate than the N staging system in predicting survival and reflects comprehensive information on lymph node dissection and positive and negative lymph node count. R NP can be used as a valuable indicator to provide prognostic guidance for lymph node positive EC patients. The novel TR NP M staging system based on R NP should be considered as an alternative to the current TNM classification.