A novel prognostic model predicts overall survival in patients with nasopharyngeal carcinoma based on clinical features and blood biomarkers

Abstract This study aims to develop and validate a novel prognostic model to estimate overall survival (OS) in nasopharyngeal carcinoma (NPC) patients based on clinical features and blood biomarkers. We assessed the model's incremental value to the TNM staging system, clinical treatment, and Epstein‐Barr virus (EBV) DNA copy number for individual OS estimation. We retrospectively analyzed 519 consecutive patients with NPC. A prognostic model was generated using the Lasso regression model in the training cohort. Then we compared the predictive accuracy of the novel prognostic model with TNM staging, clinical treatment, and EBV DNA copy number using concordance index (C‐index), time‐dependent ROC (tdROC), and decision curve analysis (DCA). Subsequently, we built a nomogram for OS incorporating the prognostic model, TNM staging, and clinical treatment. Finally, we stratified patients into high‐risk and low‐risk groups according to the model risk score, and we analyzed the survival time of these two groups using Kaplan–Meier survival plots. All results were validated in the independent validation cohort. Using the Lasso regression, we established a prognostic model consisting of 13 variables with respect to patient prognosis. The C‐index, tdROC, and DCA showed that the prognostic model had good predictive accuracy and discriminatory power in the training cohort than did TNM staging, clinical treatment, and EBV DNA copy number. Nomogram consisting of the prognostic model, TNM staging, clinical treatment, and EBV DNA copy number showed some superior net benefit. Based on the model risk score, we split the patients into two subgroups: low‐risk (risk score ≤ −1.423) and high‐risk (risk score > −1.423). There were significant differences in OS between the two subgroups of patients. Similar results were observed in the validation cohort. The proposed novel prognostic model based on clinical features and serological markers may represent a promising tool for estimating OS in NPC patients.


| INTRODUCTION
Nasopharyngeal carcinoma (NPC) is a common malignancy of the head and neck in Southern China and Southeast Asia. 1 Distant metastasis is a leading cause of treatment failure in patients with NPC; almost 70% of patients are initially diagnosed with locoregionally advanced disease. 2 Although new radiotherapeutic techniques, chemotherapy regimens, and surgical techniques have improved the survivability of NPC patients, the 5-year survival rate remains unsatisfactory. 3 Currently, the tumor-node-metastasis (TNM) staging system is commonly used to determine the prognosis of cancer patients and to guide treatment strategy. However, NPC patients who are at the same TNM stage tend to receive similar treatment, and many patients still show a poor prognosis. 4 Therefore, TNM staging has some limitations in predicting the survival rate of patients with NPC or in guiding treatments. This because the system is entirely based on the anatomical range of the existing tumors, not on the intrinsic biological heterogeneity of tumors. 5 Consequently, many biomarkers, such as clinical characteristics, 6 blood biomarkers, 7 and radiomics, 8 have been investigated to improve the prognosis prediction and treatment efficiency of NPC. However, most predictive models are integrated with the TNM staging system to improve the predictive accuracy for clinical outcomes, which makes them inapplicable to patients with uncertain TNM staging. In addition, some models are not widely used in clinical practice, because they are time-consuming, expensive, carry a high risk of radiation exposure, and are not routine medical examinations in the majority of primary care hospitals.
Recently, more blood biomarkers are used to predict clinical outcomes in many cancers because of their advantages; they are cost-effective, easily accessible, and straightforward in detecting cancer. Thus, this study aimed to construct a novel prognostic model that predicts the overall survival in NPC patients based on clinical features and routine laboratory blood biomarkers. We assessed the model's incremental value to the TNM staging system, clinical treatment, and Epstein-Barr virus (EBV) DNA copy number for individual overall survival (OS) estimation. Finally, we validated its effectiveness in patients from the same institution.

| Patient selection and data collection
Patients with diagnosed NPC from January 2009 to December 2011, who were treated for the first time at Sun Yat-sen University Cancer Center were retrospectively enrolled. The data were randomly divided into training cohort (2/3) and validation cohort (1/3). This study was performed in accordance with the guidelines outlined in the Declaration of Helsinki and was approved by the Clinical Research Ethics Committee of the Sun Yat-sen University Cancer Center. All patients provided written informed consent at the first visit to our center. The inclusion criteria for the study were as follows: (i) pathological evidence of NPC, with the absence of any other; (ii) complete baseline clinical information, blood biomarker data, and follow-up data; (iii) collection of blood biomarker data 1 week before anti-tumor therapy.

| Patients follow-up
The follow-up on patients' survival was performed by referring to the clinic's attendance records, email, and phone calls. All patients were followed-up after discharge until December 2015. The endpoint of this study was overall survival (OS) was defined as the period from the first time of diagnosing to the last follow-up or death.

| Statistical analyses
Statistical analyses were performed using IBM SPSS Statistical software version 19.0 (IBM Corp.,) and R version 3.6.0 (http://www.R-proje ct.org). Continuous variables were transformed into categorical variables, and the cut-off values of all variables were recognized by the R package "survival" and "survminer". 16 The Pearson Chi-square test was used to test the differences in distributions of clinical characteristics and blood biomarkers between the training cohort and validation cohort. We used the least absolute shrinkage and selection operator (LASSO) regression to select the most useful prognostic factors in the training cohort. According to the regulation weight λ, LASSO selects variables correlated to the measured outcome by shrinking coefficients' weights down to zero for the ones not correlated to the OS in NPC patients. 17 The optimal values of the penalty parameter λ were determined through 10-fold cross-validation with the 1-standard error of the minimum criteria (the 1-SE criteria). 17,18 Based on the optimal λ value, we screened a list of prognostic variables with associated coefficients. Then, a novel prognostic model was constructed by calculating the risk score for each patient based on each prognostic variable and its associated coefficient. To compare the predictive accuracy for individual survival between the prognostic model, TNM staging, clinical treatment, and EBV DNA copy number, we evaluated concordance index (C-index), 19 time-dependent ROC (tdROC), 20 and decision curve analysis (DCA). 21 Nomograms for the prediction of OS were built (using the rms package in R) based on prognostic model risk score, TNM staging, clinical treatment, and EBV DNA copy number. The calibration plots of nomograms were used to assess the consistency between the predicted survival and the observed survival with bootstrapping (1000 bootstrap resamples). 22 Finally, the patients in the training and validation cohort were split into low-risk and high-risk groups according to the optimal cut-off value of the prognostic model risk score. Kaplan-Meier method and log-rank tests were used to assess differences in OS between the predicted high-risk and low-risk groups. Results with two-sided p values of <0.05 were considered statistically significant.
The optimal cut-off value for each continuous variable was as follows: age (60 years), smoking index (20. Table 1. There was no significant difference in the distribution of clinical characteristics and blood-biomarkers between training cohort and validation cohort.

| Construction of the novel prognostic model
To find the prognostic variables in the training cohort, we used a LASSO regression analysis model. Figure 1A   shows the change in the trajectory of each prognostic variable. Moreover, we plotted the partial likelihood deviance versus log (λ) in Figure 1B, where λ was the tuning parameter. The value of λ was 0.03987 and was chosen by 10-fold cross-validation via the 1-SE criteria. So, we obtained 13 variables with nonzero coefficients at the value λ chosen by the cross-validation. These prognostic variables included age, BMI, hemoglobin (HGB), platelet (PLT), lymphocyte-to-monocyte ratio (LMR), CRP, CRPto-albumin ratio (CAR), globulin (GLOB), albumin-toglobulin ratio (AGR), LDH, cystatin C (Cys-C), advanced lung cancer inflammation index (ALI), and prognostic nutritional index (PNI). The coefficients of each prognostic variable are presented in Figure 1C. The C-index of the prognostic model was significantly higher than the C-index of the clinical treatment (p < 0.001), and that of EBV DNA copy number (p = 0.013). In the validation cohort, the C-index of the prognostic model was higher than that of TNM staging and clinical treatment, but was a little lower than that of EBV DNA copy number. Subsequently, we compared the area under the ROC curve (AUC) between the novel prognostic model, TNM staging, clinical treatment, and EBV DNA copy number using tdROC. In general, the AUC of our novel prognostic model was higher than the others, both in the training cohort ( Figure 2A) and the validation cohort ( Figure 2B). Finally, the DCA showed that the prognostic model had a better overall net benefit than that of TNM staging, clinical treatment, and EBV DNA copy number across a wide range of reasonable threshold probabilities in the training cohort ( Figure 3A) and the validation cohort ( Figure 3B). These results indicated that the novel prognostic model displayed better accuracy in predicting OS compared with TNM staging, clinical treatment, and EBV DNA copy number.

| Building and validating a predictive nomogram
The prognostic model risk score, TNM staging, clinical treatment, and EBV DNA copy number were integrated into nomograms to predict the 1-, 3-, and 5-year OS in the training cohort (Figure 4). Each variable was assigned a corresponding point value based on its contribution to the model. The point values for all the predictor variables are summed to arrive at the "total points" axis, and then a line is drawn vertically down from total points to predict the patient's probability of OS at 1-, 3-, and 5-year. Finally, a calibration plot was used to visualize the performance of the nomogram. The nomogram-predicted outcomes for 1-, 3-, and 5-year OS were plotted on the x-axis, while the actual observed outcome on the y-axis. The 45° line represented the best prediction, the solid dark red line represented the performance of the nomograms. The calibration curve showed that the 1-, 3-, and 5-year OS predicted by the nomograms were consistent with actual observations (Figure 5), indicating that the nomograms performed well. The nomograms and calibration curves in the validation cohort are shown in Figure S1 and Figure S2, respectively.

| Survival analyses of NPC patients according to prognostic model risk score
The optimal cut-off value of the prognostic model risk score for predicting survival was determined to be −1.423 by R package "survminer" (Figure 6A). We classified F I G U R E 1 Potential predictors' selection using LASSO regression model patients into two different subgroups based on the cutoff value: low-risk group (risk score ≤ −1.423), and high-risk group (risk score > −1.423). The distribution of the prognostic model risk score in the training and the validation cohort are shown in Figure 6B and Figure 6C, respectively. In the training cohort, for the high-risk group, the median OS was 44.4 months (IQR: 24.7-66.1). The probabilities of OS at 1-, 3-, and 5-year were 95.4%, 63.2%, and 33.3%, respectively. For the low-risk group, the median OS was 61.2 months (IQR: 44.6-67.8). The probabilities of OS at 1-, 3-, and 5-year were 98.1%, 90.7%, and 53.3%, respectively. In the validation cohort, the low-risk group showed higher survival probabilities than did the high-risk group at 1-, 3-, and 5-year (Table 3). Kaplan-Meier curves were compared to assess the differences in survival between low-risk and highrisk groups. The low-risk group showed significantly longer OS than the high-risk group for both cohorts (p < 0.05; Figure 7).

| DISCUSSION
In this study, we successfully established a novel prognostic model based on clinical features and blood biomarkers of NPC for individualized prediction of the OS. The novel prognostic model showed better predictive accuracy and discrimination compared with the traditional AJCC TNM staging system, clinical treatment, and EBV DNA copy number. The model successfully splits NPC patients into high-risk and low-risk groups, and both groups exhibited significant differences in OS. The present prognostic model consisted of 13 prognostic variables: age, BMI, HGB, PLT, LMR, CRP, CAR, GLOB, AGR, LDH, Cys-C, ALI, and PNI. All the prognosis variables were associated with survival in NPC patients except ALI. [23][24][25][26][27][28][29] These were credible evidence supporting our analysis results. The ALI was devised to assess the degree of systemic inflammation in patients with advanced non-small-cell lung cancer patients. 30 Subsequently, this index was found to be a prognostic factor of survival in some cancers. [31][32][33] The difference between the ALI and other inflammatory markers was that the former contained not only indices related to inflammation but also BMI, which was reported to correlate with the sarcopenic status. 32 So, this was the first study to indicate ALI as a prognostic marker in NPC patients.

F I G U R E 4
The nomogram was used to estimate OS for NPC patients in the training cohor Subsequently, we compared the predictive accuracy and discrimination of the novel prognostic model with TNM staging, clinical treatment, and EBV DNA copy number using C-index, tdROC, and DCA. We found that the prognostic model had good predictive accuracy and discriminatory power than TNM staging, clinical treatment, and EBV DNA copy number in the training cohort. Similar results were observed in the validation cohort except for the EBV DNA copy number. The C-index of the prognostic model was slightly lower than that of the EBV DNA copy number, but they were not significantly different. The most likely explanation was that this was a retrospective analysis, and there may have been some potential patient selection bias. Then the nomogram consisting of the prognostic model, TNM staging, clinical treatment, and EBV DNA copy number showed superior net benefit. Finally, according to the model's risk score, we split the patients into two subgroups: low-risk and high-risk, There were significant differences in OS between the two subgroups of patients. These results indicated that the novel prognostic model had good predictive accuracy and discrimination for estimating OS in NPC patients.
Although previous studies had established some models for predicting NPC survival, this study still has several merits compared with other studies. First, the prognostic model only included basic clinical and routine laboratory data, which did not include markers that are not routinely available, such as EBV DNA, 34 and circulating tumor cells (CTC). 35,36 This model was low-cost, non-invasive, convenient, and has no risk of radiation exposure. So, this model could be widely and safely used in clinical practice, especially in primary hospitals. Second, the prognostic model was constructed using a new algorithm: LASSO regression analysis, as a statistical method for screening variables to establish the prognostic model. The algorithm enabled adjusting for the model's overfitting, thus avoiding extreme predictions. Therefore, predictive accuracy could be improved significantly. This approach had been applied in a few studies. [37][38][39][40] Third, many previous models integrated TNM staging and/or clinical treatment to improve predictive accuracy for clinical outcomes, 26,41-46 which made them inapplicable to patients who have uncertain TNM staging. Our model can be used for those patients because it does not include TNM staging. Fourth, although another group, Sun et al., 40 had established two nomograms to predict the benefit of concurrent chemotherapy in stage II-IVa NPC patients, their research did not analyze other important biomarkers in the blood (except for EBV DNA). Additionally, for OS, the C-indices of the nomograms only ranged from 0.700 to 0.711. In our study, we established a novel prognostic model based on the clinical features and blood biomarkers (including F I G U R E 6 The optimal cut-off value of prognostic model risk score using R package "survival," and the distribution of the prognostic model risk score in the training cohort and validation cohort T A B L E 3 OS and OS rate in high-risk and low-risk groups according to the model risk score in the training and validation cohort.

Training cohort
Validation cohort inflammation-based scoring systems, liver function markers, and others). The C-index of our model was 0.786. Clinicians could benefit from combining our model with others. There were also several drawbacks to this study. This was a retrospective analysis, so selection bias might have occurred, and it was inevitable that there will be some patients on censoring and lost to follow-up. The treatment effect heterogeneity for metachronous metastatic cancer might have confounding effects. The endpoint of this study was OS, and we did not assess the model's suitability to predict disease-free survival (DFS), distant metastasis-free survival (DMFS), and locoregional relapse-free survival (LRFS) in NPC patients. 47 It may be better if the endpoint combined OS with DFS and DMFS. Furthermore, because other medical institutions may lack the facilities to detect some indicators (such as Cys-C, CRP, and LDH), this may limit the wide application of the model in other centers. This retrospective study was performed in EBV-related NPC patients, it is unknown whether it can be used for non-EBV-related NPC, and this would be needed to be confirmed in non-EBV-related NPC patients. Finally, our study was a single-institutional study with a relatively small sample size. Thus, a large-scale and multicenter validation of the model will be needed in the future.
In conclusion, we have established a novel prognostic model based on clinical features and blood biomarkers, which showed better predictive accuracy than traditional TNM staging, clinical treatment, and EBV DNA copy number alone. The nomograms comprising the prognostic model, TNM staging, clinical treatment, and EBV DNA can reinforce the capability of the prognostic model. Therefore, our convenient, low-cost, non-invasive, no risk of radiation exposure, and straightforward prognostic model may useful for clinicians in making decisions, counseling individual patients, and scheduling follow-ups for NPC patients.