Development and validation of a nomogram to predict survival after curative resection of nonmetastatic colorectal cancer

Abstract Background We aimed to develop a clinical applicable nomogram to predict overall survival (OS) for patients with curatively resected nonmetastatic colorectal cancer. Methods Records from a retrospective cohort of 846 patients with complete information were used to construct the nomogram. The nomogram was validated in a prospective cohort of 379 patients. The performance of the nomogram was evaluated with concordance index (c‐index), time‐dependent receiver operating characteristic (ROC) curves, calibration plots, and decision curve analyses for discrimination, accuracy, calibration ability, and clinical net benefits respectively, and further compared with AJCC 8th TNM staging and the MSKCC nomogram. Risk stratification based on nomogram scores was performed with recursive partitioning analysis. Results The nomogram incorporated age, Glasgow prognostic score, pretreatment carcinoembryonic antigen levels, T staging, N staging, number of harvested lymph nodes, and histological grade. Compared with the 8th AJCC staging and MSKCC model, the nomogram had a statistically higher c‐index (0.77, 95% CI: 0.73‐0.80), bigger areas under the time‐dependent ROC curves (AUC at 3 years: 79; at 5 years: 79), and improved clinical net benefits. Calibration plots revealed no deviations from reference lines. All results were reproducible in the validation cohort. Nomogram‐based risk stratification successfully discriminated patients within each AJCC stage (all log‐rank P < .05). Conclusion We established an accurate, reliable, and easy‐to‐use nomogram to predict OS after curative resection for nonmetastatic colorectal cancer (CRC). The nomogram outperformed the 8th AJCC staging and the MSKCC model and could aid in personalized treatment and follow‐up strategy for CRC patients.


| BACKGROUND
Colorectal cancer (CRC) is the third most commonly diagnosed and the second most deadly cancer in men and women worldwide. 1 Surgical resection is the mainstay of treatment for patients with nonmetastatic CRC, and adjuvant treatment is recommended in high-risk patients. The current gold standard for postoperation risk assessment is the TNM tumor staging system endorsed by the American Joint Commission on Cancer (AJCC). Although easy to implement with ordinal groups, survival outcomes within the same AJCC stage are quite heterogeneous due to the variability in clinicopathological features and tumor biology. 2,3 Survival paradox between stage IIB/C and stage IIIA patients is well recognized. 4 Besides, in the era of precision medicine, the categorical TNM staging fails to provide individualized predictions. Risk calculators such as nomograms have gained popularity over classifiers such as TNM staging. The AJCC committee recognized the need to develop a prognostic tool to make more personalized probabilistic predictions than those conveyed by ordinal staging system and issued guidelines to develop nomograms incorporating additional anatomical and nonanatomical prognostic factors beyond TNM. 5 Nomograms are statistical tools to provide the overall probability of a specific outcome by combining all proven prognostic variables. They utilize computational integration of multiple prognostic factors to quantify risks individually, rather than produce risk groups. 6 Nomograms have been developed for a variety of malignancies for various outcome predictions. There are a few attempts for surgically resected CRC, but the overall quality is unsatisfactory. 7 Most of the nomograms are developed from the population-based database-the Surveillance Epidemiology and End Results (SEER) database. [8][9][10] Although the sample size is quite large, a few well-recognized prognostic factors are not incorporated. Clearly, more data elements are required than what is found in tumor registries.
Compared to tumor-related factors, patient factors draw less attention. However, patient factors, for example, age, systemic inflammatory status, and nutritional status, are equally contributed to patients' prognosis. 11 Data have shown that the most highly performing model includes tumor-and patient-related factors. 10 For years, systematic inflammatory status has been recognized as an important prognostic factor in cancer patients. A recent meta-analysis revealed pretreatment Glasgow prognostic score (GPS) or modified Glasgow prognostic score (mGPS) is an independent prognostic predictor in CRC and could be useful in the management of CRC. 12 To date, none of the nomograms incorporated GPS.
The goal of this project is to develop and assess a prognostic nomogram for curatively resected CRC by incorporating clinical available tumor-and patient-related factors. We hope such a tool could help physicians to convey individualized survival information to every patient in daily practice without incurring additional cost.

| Patients
The data of patients with surgically treated nonmetastatic CRC patients were retrieved from a prospectively maintained cancer registry database of affiliated hospital of Jiangnan University as previously mentioned. 13,14 The last date of follow-up was 28 December 2018.
Patients who received curative CRC surgery between 2008 and 2013 were as the primary cohort and those treated between 2014 and 2015 as the validation cohort. The inclusion criteria for the primary cohort were as follows: (a) Patients who had curative resection of primary CRC malignancies. (b) Who had histologically confirmed colorectal adenocarcinoma. (c) Who had the full blood cell count, biochemical profile, and tumor biomarker test at the hospital within 2 weeks before surgery. Patients with any of the following conditions were excluded: (a) Who had either imaging or histologically confirmed metastatic CRC diagnosed either preoperatively or intraoperatively. (b) Who had metastatic disease within 1 month after surgery. (c) Who had neo-adjuvant chemotherapy, radiotherapy, or targeted therapy. (d) Who had bowel obstruction or perforation with emergency presentation. (e) Who was complicated with other acute diseases such as pneumonia, urinary tract infection, and cholecystitis. (f) Who had a history of chronic inflammatory diseases such as inflammatory bowel diseases and rheumatoid arthritis. (g) Who had a previous history of malignancies including CRC at different sites. (h) Who died within 1 month after surgery. (i). Whose survival status could not be ascertained. (j) Whose number of sampled lymph nodes was below 12.
The information for potential prognostic variables was collected: demographic characteristics including age and sex; pathological characteristics including primary site of tumor, histology, depth of primary tumor invasion (T), number of total lymph nodes sampled (TLN), number of metastasized lymph nodes (LNM), histological grade (G1-4), the presence of peri-neural invasion (PNI) and lymph-vascular invasion (LVI) and number of tumor deposits (TDs); blood biomarkers including carcinoembryonic antigen (CEA), white blood cell count, neutrophil count, lymphocyte count, albumin, and C-reactive protein (CRP). GPS was derived as previously stated.
Informed consent to the usage of social-demographic and clinical information in scientific endeavors was obtained from every participant. This study was approved by the ethics review board of the hospital, adhered to the Declaration of Helsinki for medical research involving human subjects, and conducted according to the TRIPOD statement. 15

| Survival analysis and nomogram development
The endpoint for this study was overall survival (OS), which was defined as the time from the date of surgery to the date of death from any cause or the last date of follow-up. Patients were censored if they were diagnosed with second malignancies after surgery. Only cases with complete information were used in the final analysis.
Normally distributed continuous variables were described as mean with standard error (SD), otherwise median values with interquartile ranges (IQR). Cox proportional hazards regression modeling was used to assess the relationship of OS with predictive variables. For continuous variables, possible nonlinearity effects on the log relative hazard of outcome were tested by modeling with restricted cubic splines, whereas statistically significant nonlinearity was identified, restricted cubic splines were used in the multivariable modeling. If restricted cubic spline modeling was failed, continuous variables were dichotomized, for which the optimal cut-points were determined by the maximally selected rank statistics to maximize the correlation with survival. The proportional hazards assumption for each variable was checked by the test proposed by Grambsch and Therneau. Multivariate models were built by including all variables from univariate models (P < .2) in a backward stepwise selection with minimal AIC (the Akaike information criterion) value. Nomogram based on the final model was constructed for the likelihood of overall survival at 3 and 5 years of surgery.

| Nomogram performance evaluation
Internal validation of the nomogram was achieved with bootstrap resampling strategy (1000 resamples). External validation was conducted in the prospective validation cohort. Briefly, the validation cohort was individually given a risk score calculated with the nomogram equation.
The performance of the nomogram was assessed and compared with the MSKCC model 10 and the 8th AJCC TNM staging. The discrimination ability of the nomogram was evaluated with the concordance index (c-index) and AIC value. A c-index of 0.5 indicated a random chance and 1.0 indicated a perfect ability to correctly discriminate the outcome with the model. The smaller the AIC value, the more the goodness-of-fit of a model. The calibration ability of the nomogram was evaluated with calibration curves for 3-and 5-year OS comparing the predicted survival with the observed survival. The predictive accuracy of the nomogram was quantified and compared using the area under the time-dependent ROC curves (AUC). A recursive partitioning analysis (RPA) was conducted to categorize risk groups based on nomogram-derived scores. Kaplan-Meier survival curves and log-rank tests were used to assess the risk stratification ability of the nomogram within AJCC stages. Finally, a decision curve analysis (DCA) was conducted to determine the clinical usefulness of the nomogram by quantifying the net benefits at different threshold probabilities. 16 All statistical analyses were two sided with P < .05 as significant and conducted with Stata 14 or R studio software (version 1.1.456).

| Patients characteristics
A total of 1576 patients were initially screened for enrollment eligibility in the study. After application of exclusion criteria, 836 patients with complete information were included in the final analysis for the primary cohort. For the validation cohort, a total of 379 patients of 505 patients were included in the final analysis after application of the same inclusion and exclusion criteria. The major reason for exclusion was less than 12 lymph nodes sampled, followed by metastasized disease and preoperative treatment. The percentage of patients excluded for missing data was less than 10%.
The baseline characteristics of the primary cohort and validation cohort were listed in Table 1. The characteristics were well balanced between the two cohorts.

| Survival analysis and development of the nomogram in the primary cohort
Among continuous variables, only age and LNM had linear effects (nonlinear P = .016 and <.001, respectively). Restricted cubic spline modeling was applied to all other continuous variables with nonlinear effects, except for CEA. CEA more than 20.03 ng/mL was regarded as high CEA level, otherwise as low. The results of univariate cox regression survival analysis for the primary cohort were presented in Table 2.
The final multivariate cox regression model with the minimal AIC value incorporated seven variables, including age, GPS, CEA, T staging, N staging, TLN, and histological grade ( Table 2). A nomogram estimating 3-and 5-year OS after curative surgery was developed incorporating the seven variables ( Figure 1). A risk score was given for each case according to the nomogram equation.

| Internal and external validation of the nomogram
For the internal validation, the bootstrap corrected c-index for the nomogram was 0.77 (95% CI: 0.74-0.80). For the external validation, the score for the individual case in the validation cohort was calculated according to the established nomogram and was then used in the Cox regression model. C-index for the validation cohort was 0.79 (95% CI: 0.73-0.85).

| Calibration curves analysis of the nomogram
The calibration plots showed a good agreement between observed and nomogram predicted 3- (Figure 2A) and 5-year OS in the primary cohort ( Figure 2B). The nomogram also demonstrated appreciable reliability in predicting 3-year OS in the validation cohort ( Figure 2C).  T A B L E 2 (Continued) 0.61-0.75) in the validation cohort. The nomogram had the lowest AIC value among the three models (2180, 2296, and 2321 for the nomogram, MSKCC model, and AJCC staging respectively in the primary cohort and 534, 566, 573 in the validation cohort). In the primary cohort, for the nomogram model, MSKCC model and AJCC staging respectively, the AUC at the 3 years were 78.4, 72.9 and 68.3 (P < .05, all pair-wise comparisons) ( Figure 2D) and the AUC at the 5 years were 78.4, 72.7, 68.6 (P < .05, all pair-wise comparisons) ( Figure 2E). In the validation cohort, the AUC at 3 years were 82.3, 76.2, and 67.5 respectively for the nomogram model, MSKCC model, and AJCC staging (P < .05, all pair-wise comparisons) ( Figure 2F).
Decision curve plots showed the nomogram was associated with improved clinical net benefits over the MSKCC model and AJCC stages (higher lines of prediction by the nomogram) within a practical range of threshold probabilities in both the primary cohort ( Figure 3A) and the validation cohort ( Figure 3B). Thus, the nomogram has the best clinical utilities in assessing individual prognosis among the three models.

| Risk stratification based on nomogram scores
Patients were stratified into three risk groups based on nomogram derived risk scores with a recursive partitioning analysis. The subgroups were as follows: low-risk group (risk score ≤12.68), intermediate-risk group (12.68 < risk score ≤ 14.07), and high-risk group (risk score >14.07). Kaplan-Meier survival curve analysis showed the three groups had statistically different prognosis in both primary cohort ( Figure 4A) and validation cohort ( Figure 4B). Remarkably, this risk stratification could successfully discriminate patients with different prognosis within each AJCC stage ( Figure 4C). There are only two patients in the IIC stage and subgroup analysis could not be performed. Besides, only two patients were categorized as high risk in the I stage and three patients were low risk in the IIIC stage. These patients were excluded from subgroup analysis because of the extremely small size.

| DISCUSSIONS
In the present study, a nomogram was developed to estimate 3-year and 5-year survival probability for curatively resected nonmetastatic CRC patients. This nomogram outperformed the MSKCC model and AJCC TNM staging in terms of discrimination, calibration abilities, and clinical utilities. The nomogram was validated in a prospective cohort and demonstrated to be quite reliable.
The selection of factors in this study was based on their availability in routine practice and established associations with overall survival in previous publications. 12,17,18 Not surprisingly, the nomogram incorporates the widely acknowledged independent prognostic factors such as T staging, N staging, histological grade, and pretreatment CEA levels. The endpoint for the nomogram is overall survival, which takes into account all cause of mortality. Age is well associated with all-cause mortality, and thus also contributes to the nomogram. Notably, number of lymph nodes sampled is a major contributor to the nomogram. As early as 2003, a secondary analysis of the Intergroup Trial INT-0089 trial demonstrated that an increase in number of lymph nodes examined was associated with increased survival for patients, regardless of nodal status. 19 The more lymph nodes examined, the less likelihood of false negativity in nodal staging. 17 The number of examined lymph nodes could relate to the surgical and pathological quality of treatment. Wide-type KRAS/BRAF and microsatellite instability (MSI) have been associated with both increased lymph node yields and improved prognosis. 20,21 It is known that tumor microenvironment and the host's immune response are important in tumor progression, and a higher lymph node yield could reflect a stronger antitumor immune response. All in all, the number of lymph nodes retrieved might reflect the underlying biology. The more TLN, the more favorable biology it is. Another important contributor to the nomogram is GPS. GPS is calculated based on serum CRP and albumin levels. Increased serum CPR levels indicate systematic inflammation status and low serum albumin levels indicate malnutrition and cachexia. Both were associated with poor prognosis in various cancers. The GPS enables better appreciation of systematic inflammation and malnutrition and reflects tumor-host interaction. Recent meta-analysis of 41 studies with 9839 CRC patients showed GPS was a strong independent poor prognostic factor regardless of tumor stages. 12 To date, there are four published nomograms predicting survival after radical surgery for nonmetastatic CRC. Two were developed from SEER database. 10 Our nomogram outperformed one of the two modes-the MSKCC model. 10 The other model was interested in making a reclassification of TNM staging and only investigated T staging and N staging. 3 The third model was based on individual patient data from three phase 3 trials in Japan. 22 Pretreatment CEA, TLN, and GPS were not investigated. The fourth model was the only model incorporating patients' systemic inflammatory status, but chose markers that were not as established as GPS or mGPS. 23 The strengths of our nomogram include an appreciable size of representative patients in real clinical setting, a F I G U R E 3 Decision curve analysis to assess the clinical usefulness of the nomogram, MSKCC model and AJCC staging in the primary cohort (A) and validation cohort (B). Dark gray line: assume no patients will die. Light gray line: assume all patients will die prospective validation cohort and readily available factors in routine practice. The variables used in the nomogram could be easily obtained by physicians in many community hospitals without any technical or cost barriers. Risk group stratification defined by the nomogram was a good complement to the 8th AJCC stage. The nomogram gives accurate and individualized mortality risk predictions and can discriminate different prognosis groups within the same TNM stage. It should enable improved patient counseling regarding treatment selection and follow-up strategy. The nomogram itself is not intended to make treatment decisions, but the nomogram directed treatment strategy could be investigated in clinical trials.
There are several points should be addressed. First and foremost, this nomogram was developed from a cohort of patients treated at a single institution including only Chinese patients. Although internal validation and prospective external validation were performed to prevent over-fit of current data, it would be better to validate the nomogram in patients from other institutions with diversified ethnicities. Second, important molecular factors such as KRAS/ NRAS/BRAF and MSI were not investigated. These factors were good treatment efficacy predictors, but their prognostic roles were controversial. In a recent meta-analysis, they were found to be not significantly or differentially associated with survival. 24 Third, pretreatment CEA level was F I G U R E 4 Kaplan-Meier survival curves of risk groups stratified based on the nomogram for the primary cohort (A), the validation cohort (B), and within AJCC stages for the primary cohort. There are two few patients in the IIC stage to perform subgroup analysis dichotomized after failing restrict cubic spline modeling. Although the cutoff value was verified to be reproducible in the validation cohort, the best way to categorize CEA is still not conclusive. Nonstandardized determination of CEA levels worldwide makes that more complex. Fourth, because the number of lymph nodes sampled relates to the quality of service CRC patients received and low lymph nodes yield might lead to understaging, we excluded patients with less than 12 sampled lymph nodes in the nomogram development to ensure reliability. Thus, the nomogram may not be transferrable to patients with less than 12 lymph nodes sampled. Fifth, the nomogram incorporated seven factors and that could possibly lead to model overfitting as a result of too many parameters. The gold way to avoid overfitting of a model is to retest it in a new set of data. The nomogram performed quite well in the independent validation cohort in terms of predictive discrimination, accuracy, and calibration ability, which means overfitting was not an issue. The selection of variables in the final model was based on the minimized AIC value. Every factor had incremental predictive ability. They all have established prognostic significance as shown by other publications.

| CONCLUSIONS
In conclusion, we propose a nomogram that could provide individualized outcome predictions with good accuracy, reliability, availability, and applicability. It could be helpful to physicians and patients in the treatment decision-making process.