External validity of a prognostic nomogram for locoregionally advanced nasopharyngeal carcinoma based on the 8th edition of the AJCC/UICC staging system: a retrospective cohort study

Background The tumor–node–metastasis (TNM) staging system does not perform well for guiding individualized induction or adjuvant chemotherapy for patients with locoregionally advanced nasopharyngeal carcinoma (NPC). We attempted to externally validate the Pan’s nomogram, developed based on the 8th edition of the American Joint Committee on Cancer (AJCC)/Union for International Cancer Control (UICC) staging system, for patients with locoregionally advanced disease. In addition, we investigated the reliability of Pan’s nomogram for selection of participants in future clinical trials. Methods This study included 535 patients with locoregionally advanced NPC who were treated between March 2007 and January 2012. The 5-year overall survival (OS) rates were calculated using the Kaplan–Meier method and compared with predicted outcomes. The calibration was tested using calibration plots and the Hosmer–Lemeshow test. Discrimination ability, which was assessed using the concordance index, as compared with other predictors. Results Pan’s nomogram was observed to underestimate the 5-year OS of the entire cohort by 8.65% [95% confidence interval (CI) − 9.70 to − 7.60%, P < 0.001] and underestimated the 5-year OS of each risk group. The differences between the predicted and observed 5-year OS rates were smallest among low-risk patients (< 135 points calculated using Pan’s nomogram; which predicted minus observed OS, − 6.41%, 95% CI − 6.75 to − 6.07%, P < 0.001) and were largest among high-risk patients (≥ 160 points) (− 13.56%, 95% CI − 15.48 to − 11.63%, P < 0.001). The Hosmer–Lemeshow test suggested that the predicted and observed 5-year OS rates had no ideal relationship (P < 0.001). Pan’s nomogram had better discriminatory ability compared with the levels of Epstein–Barr virus DNA acid (EBV DNA) and the 7th or 8th AJCC/UICC staging system, although not better compared with the combination of EBV DNA and the 8th staging system. Additionally, Pan’s nomogram was marginally inferior to our predictive model, which included the 8th AJCC/UICC N-classification, age, gross primary tumor volume, lactate dehydrogenase, and body mass index. Conclusions Pan’s nomogram underestimated the 5-year OS of patients with locoregionally advanced NPC at our cancer center, and may not be a precise tool for selecting participants for clinical trials.


Background
Nasopharyngeal carcinoma (NPC) arises from the squamous cells of the epithelial lining of the nasopharynx. Radiotherapy is the primary treatment modality because of NPC's confined anatomical location and high sensitivity to radiation. The non-specificity of nasal and aural symptoms accounts for locoregionally advanced disease in 70% of patients upon initial diagnosis [1]. Subsequently, these patients have a high risk of distant metastasis and mortality [2,3] even if treated with concurrent chemoradiotherapy. Accordingly, induction chemotherapy is commonly administered before radiotherapy in clinical practice although randomized controlled trials have not yet contributed to a consensus about its survival benefit [4][5][6][7][8]. In addition, there are no effective adjuvant chemotherapy regimens that have been identified for these patients after radiotherapy [9][10][11][12][13]. Although the tumor, node and metastasis (TNM) staging system of the American Joint Committee on Cancer (AJCC)/Union for International Cancer Control (UICC) was the main tool used to identify patients in these clinical trials, however, the findings of these trials advocate that future clinical trials require more effective stratification method for the identification of high-risk patients, instead of enrolling every patient with locoregionally advanced NPC.
Although Pan's nomogram may have greater potential than the 8th AJCC/UICC edition to identify patients for inclusion in clinical trials, however, since it was developed from a cohort of patients with stage I-IVa disease, its validity for specifically identifying patients with locoregionally advanced disease remains unknown. Additionally, external validation is important before clinical application to individualized randomized controlled trials of induction or adjuvant chemotherapy. As such, we first assessed Pan's nomogram discriminatory accuracy and calibration by using a large external cohort of patients with stage III-IVb NPC who underwent intensity-modulated radiotherapy (IMRT) and concurrent chemotherapy alone. Second, we performed a direct comparison of its performance with that of Epstein-Barr virus deoxyribonucleic acid (EBV DNA), the most recent and potential biomarker for NPC [17], in an attempt to improve Pan's nomogram.

Patient selection
Between March 2007 and January 2012, patients were deemed eligible for this study if they met the following inclusion criteria: (1) newly diagnosed with the World Health Organization type 2 or 3 NPC; (2) restaged to III-IVb (T1-2N2-3M0 and T3-4N0-3M0, based on the 8th edition of the AJCC/UICC staging system) according to pretreatment magnetic resonance imaging (MRI) of the nasopharynx and neck, chest radiography or computed tomography (CT), abdominal sonography or CT, a whole-body bone scan or [ 18 F]-fluorodeoxyglucose positron emission tomography combined with computed tomography (PET/CT); (3) ages between 20 and 75 years old; (4) treated with IMRT plus concurrent chemotherapy alone; and (5) had pretreatment levels of EBV DNA and hemoglobin. Patients were excluded if they had received anticancer therapy prior to diagnosis at our hospital, were pregnant or lactating, or if they were diagnosed with synchronous/metachronous cancer lesion(s) before or during the treatment or follow-up period.

Treatment
The cumulative radiation doses were administered in 30-33 fractions at ≥ 66 Gy to the primary tumor, ≥ 60 Gy to the involved neck area, and ≥ 50 Gy to potential sites of local infiltration and bilateral cervical lymphatics. Other IMRT information were similar to as previously detailed [18]. Concurrent chemotherapy was administrated with cisplatin/nedaplatin, 30-40 mg/m 2 weekly for up to seven cycles or 80-100 mg/m 2 every 3 weeks for two to three cycles.

Follow up
Patients were followed at least once every 3 months during the first 3 years and every 6 months thereafter. Detailed recordings of history and physical examinations were performed at each follow-up visit. Nasopharyngoscopy with or without biopsy, MRI of the head and neck, chest radiography or CT, abdominal sonography or CT, a whole-body bone scan, or [ 18 F]-fluorodeoxyglucose PET/CT were performed to detect locoregional relapse, distant metastasis, or both. Salvage treatment including reirradiation, surgery or chemotherapy, or both, was delivered to patients with confirmed relapse, distant metastasis, or persistent disease.

Statistical analysis
The 5-year overall survival (OS) rate, defined from the date of treatment to death from any cause, was predicted using Pan's nomogram for the entire cohort and each of the three different risk groups (low-risk, < 135 points; intermediate-risk, 135 to < 160 points; high-risk, ≥ 160 points calculated according to Pan's nomogram) as suggested by Pan et al. [14]. The 5-year OS rate was calculated using the Kaplan-Meier method. We compared the observed and predicted 5-year OS rates using onesample t test, where the predicted survival was served as the fixed variable while the observed value served as the assessed variable.
Next, we assessed the calibration of the model by plotting the observed and predicted 5-year OS outcomes and confirmed the findings using the Hosmer-Lemeshow calibration test [19]; for which a significant test statistic indicates that the model does not calibrate perfectly. Furthermore, discriminatory accuracy was assessed using Harrell's concordance index (C-index) [20], where it is generally accepted that a higher C-index suggests a greater ability of the model to discriminate outcomes.
We compared the discriminatory accuracy of Pan's nomogram vs EBV DNA levels, the 7th and 8th editions of the AJCC/UICC staging system, and the best predictive model of our dataset. To develop our best predictive model, prognostic factors such as age [21], sex [22], body mass index (BMI) [23], hemoglobin [24], and LDH [25], were included in backward multivariate Cox regression analysis. EBV DNA was categorized as previously described [26] because of its nonlinear effect detected using three-knot restricted cubic splines [27] nested within the Cox model.

Patients
In total, 535 patients were found eligible for this study. Table 1 lists the comparisons between our cohort and the Fujian Provincial Cancer Hospital cohort for which our analysis was restricted to patients with locoregionally advanced NPC who received IMRT plus concurrent chemotherapy treatment alone. This study results demonstrated significant differences in tumor stages and modes of chemotherapy between the two cohorts. Also, the patients from our cohort had a lower mean level of LDH (171.3 vs 193.4 U/L).
Within a median follow-up of 60 months (range 3-108 months), 43 (8.0%), 75 (14.0%), and 74 (13.8%) patients experienced locoregional failure, distant failure, and death, respectively. Table 2 displays the predicted and observed 5-year OS rates. Pan's nomogram was found to underestimate the 5-year OS of the entire cohort by 8.82% (95% CI − 9.88 to − 7.77%, P < 0.001) in addition to the survival of each risk group. The difference between the predicted and observed 5-year OS rates were smallest among low-risk patients (− 6.88%, 95% CI − 7.22 to − 6.53%; P < 0.001) and largest among high-risk patients (− 13.56%, 95% CI − 15.48 to − 11.63%; P < 0.001). Calibration plots of the predicted vs observed 5-year OS rates and survival curves by stratifying risk are illustrated in Fig. 1. The Hosmer-Lemeshow test identified that the predicted and observed OS rates differed significantly from an ideal relationship between the two survival rates (P < 0.001).

Validation
The C-index for Pan's nomogram to predict 5-year OS was 0.710 (95% CI 0.649-0.771). When comparing the discrimination ability of Pan's nomogram with that of other predictors, we observed that for EBV DNA (categorized), the C-index was 0.616 (95% CI 0.551-0.681), which indicated inferiority to Pan's nomogram (P = 0.005). For the clinical stage determined using the 8th and 7th edition of the AJCC/UICC staging system, the C-index was 0.594 (95% CI 0.536-0.651) and 0.594 (95% CI 0.531-0.656), respectively, which was much lower as compared with that of Pan's nomogram (both P < 0.001). Further, the advantage conferred by the discrimination ability achieved using Pan's nomogram sharply decreased when compared with the combination of EBV DNA (categorized) and the clinical stage determined according to the 8th edition of the AJCC/UICC staging system (C-index 0.664, 95% CI 0.605-0.724; P = 0.104). Multivariate Cox regression model using backward selection approach ultimately identified the variables, age, BMI, LDH, GTVp, and the 8th AJCC/UICC N-classification as independent prognostic factors (Table 3). Additionally, the best predictive model based on these factors achieved a marginally higher C-index (0.753, 95% CI 0.697-0.810, P = 0.097) when compared with that of Pan's nomogram.

Discussion
Our findings demonstrated that Pan's nomogram [14] underestimated the 5-year OS of patients with locoregionally advanced NPC. When the discriminatory accuracy was compared with EBV DNA, the 7th and 8th AJCC/UICC staging system, the accuracy of Pan's nomogram was found to be superior. However, Pan's nomogram did not demonstrate significant 5-year OS predictive ability as compared to the combination of EBV DNA together with the 8th AJCC/UICC staging system. Its discrimination performance was marginally inferior compared with that of the best predictive model, which fitted age, BMI, LDH, GTVp, and the 8th AJCC/UICC N-classification system.
The calibration ability of Pan's nomogram derived from our database differed from the training and validation cohort of Fujian Provincial Cancer Hospital and Pamela Youde Nethersole Eastern Hospital, respectively [14]. This can be largely explained the by following. First, given that tumor stage primarily indicates tumor burden and determines treatment outcomes [28], patients with early-stage NPC usually receive only radiotherapy, whereas, for locoregionally advanced disease, concurrent chemotherapy is strongly recommended; wherein certain cases induction or adjuvant chemotherapy is also administered before or after radiotherapy. Since we included only patients with locoregionally advanced NPC, the    [29]. Second, the patients in our database received concurrent chemoradiotherapy alone, whereas the patients in the study by Pan et al. [14] received additional chemotherapy before or after radiotherapy. Similar to randomized controlled trials [4,7], differences in chemotherapy approaches can also lead to differences in OS, even for tumors with similar stage. Therefore, our finding of non-accurate prediction by Pan's nomogram was not unexpected, particularly considering the intrinsic differences in the predictions of prognosis between our independent cohort and the original training and validation cohorts [14].
In contrast, the differences among other characteristics suggest that the prediction of Pan's nomogram was not precise enough. For example, the LDH levels of patients in our database were significantly lower compared with those of patients included in the study by Pan et al. [14] (Table 1) and the LDH level was strongly predictive of the OS of Pan's nomogram. It is, therefore, possible that the difference in the LDH levels lowered calibration accuracy. Furthermore, a significant interaction effect was observed between the GTVp and the clinical stage according to the 8th AJCC/UICC staging system. A similar interaction effect was likely to exist when both variables were included in Pan's nomogram during its development, for which the inferior calibration may be associated. Moreover, induction chemotherapy in clinical practice is commonly administered to patients with locoregionally advanced disease with large tumor volumes. Thus, our inclusion criteria restricting patients with locoregionally advanced disease who received concurrent chemoradiotherapy alone naturally selected patients with a relatively smaller GTVp compared with a previous report [30]. But notably, the average GTVp was not larger in our study compared with that of Pan et al. [14], which included patients with any tumor stage. So, selection bias may  [14] was much larger compared with the others, in which an enlarged retropharyngeal lymph node was delineated in the GTVp [31][32][33]. Pan's nomogram discriminated outcomes better compared with other single predictors such as EBV DNA and the 7th and 8th AJCC/UICC staging system. This was expected because Pan's nomogram combined several prognostic factors with tumor stage. Unfortunately, Pan's nomogram did not achieve significant superiority over the combination of EBV DNA and the tumor stage based on the 8th edition of the AJCC/UICC staging system. Moreover, it was marginally inferior to the model, which included independent prognostic factors such as the age, BMI, LDH, GTVp, and N-classification based on the 8th AJCC/UICC staging system.
Risk prediction programs [26,[34][35][36][37] other than Pan's nomogram are available [14]. However, Pan's nomogram incorporates several important and well-known clinical predictors. In particular, it is the only one developed using a cohort of patients other than those from our cancer center. However, the underestimation of OS in this external validation indicates that Pan's nomogram cannot accurately identify authentic high-risk patients from all patients with locoregionally advanced NPC.
The limitations of this study are as follows. The lack of unified treatment approaches, chemotherapy regimens, and radiation or chemotherapy doses determined by the nature of retrospective design may, to a certain extent, bias the findings of this study. Also, due to the small sample size of patients analyzed, this could have possibly lowered the confidence of validation derived from this study. Lastly, validation by a single institution does not essentially provide a strong evidence and further large cohort, multi-institutional analysis is still required.

Conclusions
Pan's nomogram was observed to significantly underestimate the 5-year OS of patients with locoregionally advanced NPC. It failed to precisely identify high-risk participants for inclusion in randomized controlled trials.