Proposal and validation of a modified staging system to improve the prognosis predictive performance of the 8th AJCC/UICC pTNM staging system for gastric adenocarcinoma: a multicenter study with external validation

Background The 8th edition of the American Joint Committee on Cancer/Union for International Cancer Control (AJCC/UICC) pathological tumor-node-metastasis (pTNM) staging system may have increased accuracy in predicting prognosis of gastric cancer due to its important modifications from previous editions. However, the homogeneity in prognosis within each subgroup classified according to the 8th edition may still exist. This study aimed to compare and analyze the prognosis prediction abilities of the 8th and 7th editions of AJCC/UICC pTNM staging system for gastric cancer and propose a modified pTNM staging system with external validation. Methods In total, clinical data of 7911 patients from three high-capacity institutions in China and 10,208 cases from the Surveillance, Epidemiology, and End Results (SEER) Program Registry were analyzed. The homogeneity, discriminatory ability, and monotonicity of the gradient assessments of the 8th and 7th editions of AJCC/UICC pTNM staging system were compared using log-rank χ2, linear-trend χ2, likelihood-ratio χ2 statistics and Akaike information criterion (AIC) calculations, on which a modified pTNM classification with external validation using the SEER database was proposed. Results Considerable stage migration, mainly for stage III, between the 8th and 7th editions was observed in both cohorts. The survival rates of subgroups of patients within stage IIIA, IIIB, or IIIC classified according to both editions were significantly different, demonstrating poor homogeneity for patient stratification. A modified pTNM staging system using data from the Chinese cohort was then formulated and demonstrated an improved homogeneity in these abovementioned subgroups. This staging system was further validated using data from the SEER cohort, and similar promising results were obtained. Compared with the 8th and 7th editions, the modified pTNM staging system displayed the highest log-rank χ2, linear-trend χ2, likelihood-ratio χ2, and lowest AIC values, indicating its superior discriminatory ability, monotonicity, homogeneity and prognosis prediction ability in both populations. Conclusions The 8th edition of AJCC/UICC pTNM staging system is superior to the 7th edition, but still results in homogeneity in prognosis prediction. Our modified pTNM staging system demonstrated the optimal stratification and prognosis prediction ability in two large cohorts of different gastric cancer populations.


Background
Gastric cancer (GC) remains both the second most prevalent cancer [1] and the most frequent cause of cancer-related death in China [2]. Nearly half of the global total new GC diagnoses each year occur in China [3,4]. Although current practice includes chemotherapy, irradiation, and/or targeted therapy in the treatment protocol, surgical resection remains the only means for cure [5]. Regarding the prognostic markers for patients undergoing surgical treatment, the American Joint Committee on Cancer (AJCC)/International Union against Cancer (UICC) pathological tumor-node-metastasis (pTNM) staging system is currently used as the most important and basic tool for patient stratification. The AJCC/UICC has published the 8th edition of pTNM staging system for GC and has introduced some changes on the basis of the 7th edition [6,7]. Among those changes, the most important one is the subdivision of the category N3ab into N3a and N3b, which affects consequent staging, especially for stage III. Thus, the prediction of survival probability of stage III patients are believed to be considerably affected, and this latest edition may have implications on treatment. To date, although the prognosis prediction ability of the 8th AJCC/UICC pTNM staging system for GC has already been addressed, its accuracy remains unclear.
In this retrospective study, we compared the prognosis prediction abilities of the 8th and 7th editions of AJCC/ UICC pTNM staging system using a large Chinese multicenter database of GC as a training cohort. We then proposed a modified pTNM staging system for better prognosis prediction of advanced GC and performed external validation in a large cohort of Western GC patients. From 18 registries of the Surveillance, Epidemiology, and End Results Program (SEER), a retrospective review of clinical records of all GC patients who underwent gastrectomy between January 1998 and December 2012 was performed. The patients were excluded if they had incomplete/missing information regarding their age, tumor size, tumor location, Lauren type, depth of invasion, lymph node status, non-radical resection, and/ or status of distant metastasis. This study protocol was approved by the institutional review boards of SYSU, TJMU, and CMU.

Follow-up
A strict disease-monitoring program with outpatient records, telephonic interviews, and electronic messages was conducted and included clinical and laboratory examinations every 3 months for the first 2 years, every 6 months from the 3rd to the 5th years, and annually thereafter until at least 5 years after the operation or until the patient died, whichever came first. The last date of follow-up was December 31, 2016. The endpoint of this study was overall survival (OS), which was defined as the date from surgery until the date of death or the last date of follow up. Patients who were still alive after the completion of follow-up were all censored. system using data from the Chinese cohort was then formulated and demonstrated an improved homogeneity in these abovementioned subgroups. This staging system was further validated using data from the SEER cohort, and similar promising results were obtained. Compared with the 8th and 7th editions, the modified pTNM staging system displayed the highest log-rank χ 2 , linear-trend χ 2 , likelihood-ratio χ 2 , and lowest AIC values, indicating its superior discriminatory ability, monotonicity, homogeneity and prognosis prediction ability in both populations.

Conclusions:
The 8th edition of AJCC/UICC pTNM staging system is superior to the 7th edition, but still results in homogeneity in prognosis prediction. Our modified pTNM staging system demonstrated the optimal stratification and prognosis prediction ability in two large cohorts of different gastric cancer populations.

Statistical analyses
All patients were restaged according to the 8th and 7th AJCC/UICC GC pTNM staging systems. Survival curves were plotted using the Kaplan-Meier method, and the log-rank test was used to determine the relationships between the investigated clinicopathological factors and OS. Factors deemed having potential significance (P < 0.05) on univariate analysis were included in multivariate analyses. Multivariate analysis of OS was performed using the Cox proportional hazards model with the forward logistic regression (LR) stepwise procedure for variable selection.
The prognosis prediction performance of the 8th and 7th AJCC/UICC GC staging systems was investigated in terms of discriminatory ability (differences in the survival among patients in different stages), monotonicity (patients at earlier stages with longer survival than those in later stages), homogeneity (small differences in the survival among patients within the same stage) [9]. The log-rank χ 2 test, linear-trend χ 2 test, likelihood-ratio χ 2 test, and Akaike information criterion (AIC) within the Cox regression model were used to compare the stratification and prognosis prediction performance between the two editions of staging systems. The discriminatory ability and monotonicity of gradient assessments were measured using the log-rank χ 2 test and the linear-trend χ 2 test. Homogeneity was measured using the likelihood-ratio χ 2 test, and AIC was used to measure the prognostic stratifications. Higher log-rank χ 2 and lineartrend χ 2 scores indicated better discriminatory ability and monotonicity, higher likelihood-ratio χ 2 scores indicated greater homogeneity, and smaller AIC values represented better prognostic stratification. Hazard ratios (HR) and 95% confidence intervals (95% CI) were also generated. All calculations were performed using SPSS 20.0 software (SPSS Inc., Chicago, IL, USA), and a P value < 0.05 was considered statistically significant.

Patient clinicopathological features, univariate and multivariate analyses
After screening of all the patients to be investigated, 7911 patients from the Chinese database were identified as being eligible (median age, 59 years; age range, 15-89 years) and were defined as the training cohort ( Fig. 1). Among 31,988 cases from 18 SEER registries, 10,208 were eligible (median age, 67 years; age range, 14-100 years) and were defined as the external validation cohort. The median follow-up was 74 months (range, 1-182 months). The proportions of patients with ≤ 15 and > 15 retrieved lymph nodes (LNs) were 30.5% and 69.5%, respectively, in the training cohort and 53.2% and 46.8%, respectively, in the external validation cohort. Table 1 illustrates the association of  the investigated clinicopathological features with the 5-year OS rates of GC patients. The median tumor size was 4.5 cm (range, 0.1-35.0 cm), and the median number of LNs retrieved was 21 (range, 1-118) in the training cohort. For the external validation cohort, the median tumor size was 4.1 cm (range, 0.1-30.0 cm), and the median number of LNs retrieved was 15 (range, 1-90). In the univariate analyses of both cohorts, age, tumor location, tumor size, Lauren type, pT stage, pN stage, and pTNM stage classified according to the 7th and 8th editions of AJCC/UICC staging system were significantly associated with the 5-year OS rates (all P < 0.001).
In multivariate analyses, age, tumor size, tumor location, Lauren type, and pTNM stage classified according to the 7th and 8th editions of AJCC/UICC staging system were identified as independent prognostic factors (all P < 0.001; Table 2). Figure 2 illustrates the stage migration between the 7th and 8th AJCC/UICC staging systems for both cohorts. The migration was mainly observed in stage III patients. In the training cohort, 197 (2.5%) and 1841 (23.2%) patients were observed to be upstaged and downstaged, respectively, as classified according to the 8th edition over the 7th edition of AJCC/UICC staging system. The external validation cohort similarly demonstrated that 260 patients (2.5%) were upstaged, and 1320 patients (12.9%) were downstaged.

Discriminatory ability and monotonicity of the 7th and 8th AJCC/UICC staging systems
The OS curves of patients grouped according to the two editions of AJCC/UICC staging system are displayed in Fig. 3a, b,   respectively. For the training cohort, the OS curves showed significant differences between every two groups classified according to either the 7th (all P < 0.001; Fig. 3a) or the 8th AJCC/UICC staging system (all P < 0.001; Fig. 3b). Similar results were observed in the external validation cohort (all P < 0.001; Fig. 3d, e). The observed survival differences among the groups represented satisfactory discriminatory ability and monotonicity of both staging editions.

Homogeneity of the 7th and 8th AJCC/UICC staging systems
In the training cohort, the 7th AJCC/UICC staging system demonstrated poor homogeneity in stage IIIA-C because the survival rates of subgroups of patients within stage IIIA, IIIB, or IIIC were significantly different (all P < 0.001; Fig. 4a-c). When classified according to the 8th AJCC/UICC staging system, subgroups of patients within stage IIIA or IIIB still showed significant differences in survival (both P < 0.001; Fig. 4d, e), but those within stage IIIC did not show such differences (P = 0.364; Fig. 4f ).

Proposal of a modified pTNM staging system
To improve the homogeneity in stage III classification, a modified pTNM staging system was proposed according to the best log-rank χ 2 values in the training cohort. In the modified pTNM staging system, with the bestobserved homogeneity (Fig. 5, upper part), stage IIIA was composed T3N2, T4aN1, and T4bN0; stage IIIB was composed of T2N3a, T3N3a, T4aN2, T4bN1, and T4bN2; and stage IIIC was composed of T2N3b, T3N3b, T4aN3a, T4aN3b, T4bN3a, and T4bN3b (Fig. 5, lower  part). Stage I and II classifications remained unchanged. The modified pTNM staging system demonstrated optimal discriminatory ability and monotonicity in both the training and external validation cohorts as supported by Fig. 2 Stage migration between the 7th and 8th editions of AJCC/UICC staging system as well as the modified pTNM staging system for both the training and external validation cohorts. a Classification illustrated in the training cohort; b classification illustrated in the external validation cohort; c patient grouping in the training cohort; d patient grouping in the external validation cohort. Stage migration was mainly observed for patients with stage III disease; therefore, for better contrast among patients at this specific stage, its sub-stages IIIA, IIIB, and IIIC were colored green, blue, and red, respectively. Abbreviations: AJCC/UICC, American Joint Committee on Cancer/Union for International Cancer Control; ed., edition of the tumor-node-metastasis (TNM) staging system mild differences in survival (Fig. 3c, f ). When classified according to the modified pTNM staging system, subgroups of patients within stage IIIA, IIIB, or IIIC showed no significant differences in survival in either the training cohort (all P > 0.05; Fig. 4g-i) or the external validation cohort (all P > 0.05; Fig. 4p-r).

Prognosis prediction performances of the 7th and 8th AJCC/UICC staging systems against the modified pTNM staging system
The performance results of the competing staging systems are displayed in Table 3. Compared with the 7th and 8th AJCC/UICC staging systems, the modified pTNM staging system demonstrated the best homogeneity (the highest likelihood-ratio χ 2 score), discriminatory ability, gradient monotonicity (the highest log-rank χ 2 and linear-trend χ 2 scores), and the lowest AIC value, displaying an optimal prognostic stratification ability in both the training and external validation cohorts.

Discussion
In the present study, both the 7th and 8th AJCC/UICC staging systems demonstrated poor homogeneity in the training and external validation cohorts, particularly for stages IIIA, IIIB, and IIIC, an observation that was not mentioned by the International Gastric Cancer Association (IGCA). Thus, a modified pTNM staging system was proposed. For convenience in the clinical application of the proposed modified pTNM staging system, the classifications of "T" and "N" categories were not altered, and, based on our statistics, we focused on a more homogenized re-classification approach to improve the subgroup classification. The Kaplan-Meier OS curves demonstrated similarity among the subgroups of patients within stage IIIA, IIIB, or IIIC classified according to the modified pTNM staging system and revealed optimal homogeneity. Furthermore, compared with the 7th and 8th AJCC/UICC staging systems, the modified pTNM staging system also displayed the best homogeneity, discriminatory ability, and monotonicity of gradients both in the training and external validation cohorts. The TNM staging system is the common "language of cancer" [10,11], enabling comparisons between different populations irrespective of country and ethnicity. With the improvement of surgical techniques, the number of retrieved LNs is increased dramatically, and the definition of the category N3ab as the presence of more than 6 metastatic LNs is too broad. In the 8th AJCC/UICC pTNM staging system for GC, the category N3ab is subdivided into N3a and N3b to improve the accuracy of staging and prognosis prediction. Our results have shown that, with this subdivision, the 8th AJCC/UICC pTNM staging system (comprised of 25 subgroups of the T, N, and M categories) provided a more precise classification than those the 7th edition (comprised of 20 subgroups), emphasizing personalized treatment. However, among the recently published studies that had compared the prognosis prediction performance between the 8th and 7th editions, none focused on the homogeneity of both editions [12][13][14][15].
In the present study, 197 (2.5%) patients were upstaged and 1841 (23.2%) were downstaged as classified according to the 8th edition over the 7th edition of AJCC/UICC pTNM staging system in the training cohort, whereas 260 (2.5%) were upstaged and 1320 (12.9%) were downstaged in the external validation cohort. We also observed that the majority of stage migration occurred for stage III patients (99%, data not shown) in both cohorts, whereas only 1% was observed for stage II patients (T1N3b and T2N3b). As such, the present study was mainly focused on patients with stage III disease.
Furthermore, our analyses revealed that the 8th edition had better discriminatory ability and monotonicity than did the 7th edition in both cohorts, which was consistent with the results reported by IGCA [16]. However, Kaplan-Meier analyses indicated significant differences in OS among the subgroups of patients within stage IIIA, IIIB, or IIIC classified according to either of the two staging editions. This poor homogeneity was significantly improved in our modified pTNM staging system.
Although our proposed modified pTNM staging system was shown to be superior to the 7th and 8th AJCC/ UICC pTNM staging systems, there are certain limitations worth mentioning. First, our training cohort was based on a Chinese population database. Whether this proposed modified pTNM staging system is suitable for populations from other countries has yet to be verified. However, the treatment protocol for locally advanced GC of the same TNM category differs in Asian and Western cancer centers and may explain the observed lower 5-year OS rate in the external validation cohort as compared with that in the training cohort. Neoadjuvant therapies followed by radical resection (including D1 or D1+ lymphadenectomy) are conventionally opted in the west; however, in Asian cancer centers, radical surgery (D2 Fig. 4 Homogeneity in stage classifications using the 7th and 8th AJCC/UICC staging systems and the modified pTNM staging system for both the training and external validation cohorts. a-c Stages IIIA, IIIB, and IIIC classified according to the 7th edition, respectively; d-f stages IIIA, IIIB, and IIIC classified according to the 8th edition, respectively; g-i stages IIIA, IIIB, and IIIC classified according to the modified pTNM staging system, respectively. j-l Stages IIIA, IIIB, and IIIC classified according to the 7th edition, respectively; m-o stage IIIA, IIIB, and IIIC classified according to the 8th edition, respectively; p-r stage IIIA, IIIB, and IIIC classified according to the modified pTNM staging system, respectively. The homogeneity of the proposed modified pTNM staging system is higher, supporting by mild differences in survival curves, than those of the 7th and 8th AJCC/UICC staging systems (See figure on next page.) lymphadenectomy) followed by adjuvant therapy are primarily considered. Therefore, to extend the possible use of our proposed modified pTNM staging system, we used the SEER database for external validation. Additionally, to the best of our knowledge, the sample size of the training cohort, came from three highest-capacity GC centers across North and South China, is the largest among all such studies. This further supports the reliability of the results of the present study. Additionally, despite the difference in OS between the training and external validation cohorts that may have been caused by distinct demographic features, different lymphadenectomy types Proposal of a modified pTNM staging system according to the best log-rank χ 2 values in the training cohort. In the modified pTNM staging system, stage IIIA was composed of T3N2, T4aN1, and T4bN0; stage IIIB was composed of T2N3a, T3N3a, T4aN2, T4bN1, and T4bN2; and stage IIIC was composed of T2N3b, T3N3b, T4aN3a, T4aN3b, T4bN3a, and T4bN3b. The M classification was not considered since all patients underwent R0 resection and had no distant metastasis and pathological variables, the proposed modified pTNM staging system can still be universally applied in the West because it was successfully validated in a large external validation cohort from the SEER database. Second, the sample sizes of some subgroups classified according to the 8th AJCC/UICC pTNM staging system were relatively small [for instance, T1N3b (0% in the training cohort and 0.05% in the external validation cohort) and T2N3b (1.0% in the training cohort and 0.2% in the external validation cohort)], possibly due to the low rate of LN metastasis in patients at stage T1 or T2, and may have influenced the efficiency of comparison. Therefore, a study with a much larger sample size is required to further confirm the findings of the present study. Third, due to the retrospective nature of the present study, tumors involving the esophagogastric junction (EGJ) were not included in our analysis because the distances of their epicenters from the EGJ were not specifically mentioned in the retrieved Chinese and SEER databases.

Conclusions
Using large cohorts of patients from Chinese cancer centers and the SEER database, our results identified that both the 7th and 8th AJCC/UICC pTNM staging systems still possess poor homogeneity, particularly for stage III GC patients, although the homogeneity, discriminatory ability, and monotonicity of gradients are improved in the 8th edition. A modified pTNM staging system for GC was thereby proposed and validated, demonstrating superior stratification and prognosis prediction ability and suggesting high potential for clinical application in different populations.

Authors' contributions
Conception and design: CF, ZW, HX, HL, and ZZ. Collection and assembly of data: WW, SRS, JD, and ZS. Data analysis and interpretation: WW. Manuscript writing: CF, WW, and SRS. All authors read and approved the final manuscript.