Nomograms for predicting specific distant metastatic sites and overall survival of colorectal cancer patients: A large population‐based real‐world study

Abstract Background This study aims to develop functional nomograms to predict specific distant metastatic sites and overall survival (OS) of colorectal cancer (CRC) patients. Methods CRC case data were retrospectively recruited from a large population‐based public dataset. Nomograms were developed to predict the probabilities of specific distant metastatic sites and OS of CRC patients. The performance of nomogram was evaluated with the concordance index (C‐index), calibration curves, area under the curve (AUC), and decision curve analysis (DCA). Results A total of 142 343 cases were included in the current study. On the basis of univariate and multivariate analyses, clinicopathological features were correlated with specific distant metastatic sites and survival outcomes and were used to establish nomograms. The nomograms showed excellent accuracy in predicting specific distant metastatic sites. The C‐indexes for the prediction of liver, lung, bone, and brain metastases were 0.82 (95% confidence interval (CI), 0.81‐0.83), 0.80 (95% CI, 0.78‐0.81), 0.83 (95% CI, 0.79‐0.86), and 0.73 (95% CI, 0.72‐0.84), respectively. Then, a prognostic nomogram integrating clinicopathological features and specific distant metastatic sites was established to predict 1‐, 3‐, and 5‐year OS of CRC, with AUCs of 0.764 (95% CI, 0.741‐0.783), 0.762 (95% CI, 0.745‐0.781), and 0.745 (95% CI, 0.730‐0.761), respectively. DCA showed that the prognostic nomogram had a better clinical application value than current TNM staging system. Conclusions Based on clinicopathological features, original nomograms were constructed for clinicians to predict specific distant metastatic sites and OS of CRC patients. These models could help to support the postoperative personalized assessment.

Conclusions: Based on clinicopathological features, original nomograms were constructed for clinicians to predict specific distant metastatic sites and OS of CRC patients. These models could help to support the postoperative personalized assessment.

K E Y W O R D S
colorectal cancer, decision curve analysis, distant metastasis, nomogram, overall survival

BACKGROUND
In 2019, there were 148 000 new cases of colorectal cancer (CRC), which accounted for more than 146 deaths per day, with an approximately 19.1/100 000 mortality rate, ranking third among all malignant tumors in the United States. 1 Over the past 30 years, the incidence and overall survival (OS) rate of CRC have been rising worldwide. The 5-year OS rate of CRC patients was approximately 65.2%. An important reason for limited 5-year survival in CRC patients is distant metastasis, including liver, lung, brain, and bone metastasis.
Current research has indicated that clinicopathological characteristics such as histological classification, pretreatment carcinoembryonic antigen (CEA) levels, distant metastasis site, and depth of infiltration may also affect survival outcomes in patients with CRC. 2 The postoperative survival of CRC patients with different clinicopathological features varies greatly. For instance, the 5-year survival rate of CRC patients with distant metastases, such as brain metastases, is less than 10%, while the 5-year survival rate of CRC patients with infiltration not exceeding the muscular layer reaches more than 90%. 3 The prognosis of CRC patients varies in different clinicopathological factors. Therefore, a statistical model tool is required to comprehensively combine the effect of various clinicopathological elements on the outcomes of CRC patients.
The prognosis of CRC is associated with the current tumornode-metastasis (TNM) staging system. For patients with the same tumor stage, prognosis can significantly vary because of the heterogeneity of CRC. Although TNM staging system is extensively used in the postoperative decision-making for treatment strategy and prognosis evaluation of CRC patients in current clinical practice, its existing shortcomings cannot be ignored, which has been widely studied in recent years. 4 Accumulating prognostic biomarkers have been explored, studied and applied in clinical practice to make up for the deficiency of current TNM classification system. For example, microsatellite instability (MSI)/mismatch repair (MMR) status has been recommended to be the most commonly used and powerful molecular marker in the clinical management of CRC patients. 5,6 In addition, the expression statuses of diverse genes, such as KRAS and BRAF, have been found to be closely related to the prognosis of CRC patients. 7,8 Nevertheless, both immunohistochemistry and gene detection methods have limitations, but also bring a certain economic burden to patients. At present, researchers have been trying to develop and validate the nomogram to predict patients' prognosis by using the clinicopathological data. Through the combination of clinicopathological features, the nomogram can clearly and intuitively quantify the local recurrence chances, distant metastasis rates, and survival probabilities of patients with cancer. These studies have also achieved corresponding success. 9,10 Currently, most nomogram models cannot predict all metastatic sites and probabilities of patients, or the metastatic states are not included when constructing these models, which limits their clinical application.
With the application of a large population-based public dataset, the Surveillance, Epidemiology, and End Results (SEER) program that provides extensive clinicopathological and follow-up information of cancer patients, covering about 28% of the population in the United States, researchers have carried out a large number of clinical studies on different cancers. 11 In this study, all CRC cases with clinicopathological and survival information were collected from the SEER program to establish intuitive and comprehensive nomograms for predicting specific distant metastatic sites and prognosis in CRC patients.

Patients
In present study, a total of 254 754 cases were obtained from the SEER cohort. The flow chart of case inclusion and exclusion is shown in Figure 1. All cases who received radical operation from 2010 to 2016 were involved in the study and analyzed retrospectively. Cases with no or multiple tumors in the pathological report were excluded. Accordingly, 15 clinicopathological characteristics were extracted from SEER program, including gender, race, tumor location, pathological grade, histological type, age at diagnosis, tumor size, F I G U R E 1 Recruitment pathway of CRC patients with specific metastasis sites and complete follow-up information to establish predictive and prognostic nomograms pretreatment CEA level, number of lymph nodes harvested (LNH), T stage, N stage, specific distant metastatic sites (liver, lung, bone, and brain). If recorded as Unknown, Asian, Native American (NA), or Pacific Islander (PI), cases were allocated to an "other" race category for analysis. The exclusion criteria are as follows: (a) Absence of important clinicopathological factors, such as grade, histological type, T stage, and N stage. (b) Loss of specific metastatic sites (lung metastases, liver metastases, bone metastases, or brain metastases). (c) Incomplete survival information (survival months and survival status). Patient survival was measured by OS. 12 Finally, 142 343 patients with stage I-IV CRC, ensuring a full range of clinical and pathological data, were chosen from the SEER database.

Construction of prediction and prognostic nomograms
Univariate and multivariate logistic regression and Cox regression analyses identified independent prognostic factors for specific distant metastasis site and OS, respectively, and hazard ratio (HR) was used to measure the impact of each independent prognostic factor on specific distant metastasis site and OS, respectively. Then, according to the results of multivariate logistic regression analysis, predictive nomograms were established to predict the risk of liver, lung, bone, and brain metastasis in CRC patients. Meanwhile, combined with four specific distant metastasis sites, one prognostic nomogram was recommended to predict the OS probability of CRC patients. Based on the nomogram scoring, cases were classified as low-, moderate-, and high-risk subgroups.

Receiver operating characteristic (ROC) curves and prediction error curves
The differentiation ability of the nomograms was evaluated by ROC and calibration curves. The accuracy in predicting distant metastases was measured by the logistic ROC and calibration curves. Application of ROC curves in 1-, 3-, and 5-year survival probability to evaluate the prediction ability of nomogram over time. The value of AUC is the same as that of the concordance index (c-index) in logistic regression model. The maximum AUC is 1.0, indicating a perfect discrimination, while 0.5 stands for a random chance to correctly identify the nomogram. The prediction error curve of the model was used to compare the TNM staging system error rate with that of the prognostic nomogram over time. 13

Decision curve analysis
As a novel tool to evaluate the nomogram in clinical application value, decision curve analysis (DCA) was performed in present study as a method for assessing the predictive models' ability to visualize the clinical outcomes, and was conducted to compare the net benefit of the predictive and prognostic nomograms. 14 The aim of DCA is to evaluate the risk of adverse outcomes of individuals, and to suggest some intervention or treatment for sufficiently high-risk individuals.

Risk stratification
In order to test the discrimination of nomogram, all cases were redefined as low-, medium-, and high-risk subgroups based on the eventual risk score. The survival curves of different risk subgroups were drawn by Kaplan-Meier method and evaluated by log-rank test.

Statistical analysis
All statistical analyses were performed by R software (www.r-project.org, version 3.3.3). In the present study, the following R packages were downloaded to build nomogram, plot ROC curves, calibration, and DCA curve, and to draw the Kaplan-Meier curves: "Hmisc," "survival," "rms," "pROC," "survivalROC," "MASS," and "rmda." All statistical tests were two-sided, with P-values < .05 considered statistically significant.

F I G U R E 2
Four nomograms convey the results of predictive models using clinicopathological characteristics to predict the possibility of liver (A), lung (B), bone (C), and brain (D) metastasis

Construction of predictive nomograms for specific distant metastatic sites
Univariate logistic regression for the presence of different metastatic sites showed that 11 variables, including race, sex, age, CEA level, grade, tumor site, histological type, tumor size, LNH, N stage, and T stage were related to distant metastatic sites (Table 2). In multivariate logistic regression (Table 3), the vast majority of variables, including sex, age, tumor site, CEA level, grade, histological type, tumor size, N stage, T stage, and LNH were determined as independent risk factors predicting liver metastases of CRC. Nine parameters, including race, age, tumor site, CEA level, grade, tumor size, N stage, LNH, and T stage, were determined to be independent risk factors predicting lung metastases of CRC. Seven comparable parameters, including sex, age, CEA level, grade, N stage, LNH, and T stage were identified as independent risk factors predicting bone metastases of CRC. Four factors, including CEA level, N stage, T stage, and tumor size were defined as independent risk factors predicting brain metastases of CRC. On the basis of multivariable logistic regression analyses for specific distant metastatic sites, all of the independent significant risk factors were integrated to establish nomograms for specific metastatic site prediction. The predic-tive nomograms for liver (Figure 2A), lung ( Figure 2B), bone ( Figure 2C), and brain ( Figure 2D) metastases are illustrated in Figure 2. The ROC curves and the C-index values were used to appraise the discrimination abilities of the nomograms. The C-index for the prediction of liver, lung, bone, and brain metastases were 0.82 (95% confidence interval (CI), 0.81-0.83), 0.80 (95% CI, 0.78-0.81), 0.83 (95% CI, 0.79-0.86), and 0.73 (95% CI, 0.72-0.84), respectively. To ensure that the nomogram forecast models had advantageous efficacy in predicting the specific metastatic sites of CRC patients, logistic ROC analyses were conducted. The area under the curve (AUC) of the nomograms for the prediction of liver, lung, bone, and brain metastases was 0.825 (95% CI, 0.817-0.832), 0.798 (95% CI, 0.784-0.813), 0.823 (95% CI, 0.789-0.863), and 0.786 (95% CI, 0.714-0.859), respectively (Figure 3A, D, G, and J). In addition, calibration curves of the nomograms used to predict liver, lung, bone, and brain metastases showed no significant deviation from the reference line, which indicated a good degree of confidence ( Figure 3B, E, H, and K). DCA is a novel method for appraising alternative prognostic instruments, which takes virtue over the AUC. The DCA curves for the predictive nomogram are presented in Figure 3C, F, I, and L. DCA showed that the predictive F I G U R E 3 AUC values of ROC predicted liver (A), lung (D), bone (G), and brain (J) metastasis rates of Nomogram. The calibration curve of predictive nomograms for predicting CRC patients' liver (B), lung (E), bone (H), and brain (K) metastasis rates. Decision curve analysis of the predictive nomogram for predicting liver (C), lung (F), bone (I), and brain (L) metastasis nomogram had high net benefits, meaning that it had good clinical implementation significance in predictive specific metastatic sites.

Establishment of a prognostic nomogram integrating clinicopathological features and specific distant metastatic sites
According to the univariate and multivariate Cox regression analyses results for the OS of CRC patients, all of the significant variables, including race, sex, tumor site, CEA level, grade, age, histological type, tumor size, N stage, LNH, N stage, liver metastases, lung metastases, bone metastases and brain metastases, were integrated to establish the prognostic nomogram for OS ( Table 4). The prognostic nomogram for 1-, 3-, and 5-year OS is shown in Figure 4A. By aggregating the scores of each variable and casting the total score on the bottom scale, probabilities can be assessed for 1-, 3-, and 5-year OS. In addition, calibration curves for the nomogram showed no deviations from the reference line, which indicated a high degree of credibility ( Figure 4B-D).
For the prognostic prediction nomogram, the C-index values and ROC curves were used to evaluate the discrimination power of the nomogram. The C-index for the prediction of OS was 0.729 (95% CI, 0.724-0.734). To confirm that the nomogram prediction model had superior efficacy over the TNM The DCA curves for the prognostic nomogram and TNM staging system are presented in Figure 5B. Compared with the TNM staging system, DCA showed that the prognostic nomogram had higher net benefits, meaning that it had better clinical implementation significance. The corresponding prediction error curves of the models in Figure 5C showed that the prognostic nomogram had a lower error rate than the TNM staging system, indicating that the nomogram had more accurate discrimination than the TNM staging system.

Prognostic nomogram for risk stratification
According to the total score calculated by the prognostic nomogram, all cases were divided into three subgroups, each of which represented a different outcome. The prognosis of each subgroup was reflected by Kaplan-Meier survival curve,  which is shown in Figure 6. Based on OS events, group 1 (low-risk group) had the highest 5-year OS of 83.6%, followed by group 2 (moderate-risk group), with a 5-year OS of 66.4%; group 3 (high-risk group) showed the lowest 5-year OS of 38.8%. Statistically significant distinctions in survival outcomes were noticed among the three groups.

DISCUSSION
In the current study, nomograms merging clinical and pathological parameters with metastatic information were built to evaluate distant metastasis rates and the 1-, 3-, and 5-year OS probabilities of CRC patients. The identification and calibration of the nomograms were confirmed, and these nomograms have a wide range of applications. According to the ROC, DCA, and error curves, the prognostic nomogram showed better prediction accuracy for CRC than the current TNM staging system. Moreover, the nomogram was qualified to divide patients with CRC into low-, moderate-, and highrisk groups, which suggested that this nomogram could be routinely used for predicting the prognosis of CRC patients. At present, the diagnosis of young CRC patients has increased. Previous research has demonstrated that age is an independent prognostic factor of CRC patients, with younger age associated with more promising outcomes. 15 Metastatic prediction nomograms have indicated that CRC patients younger than 60 years were more apt to experience a higher risk of lung, liver, and bone metastases. 16,17 Otherwise, race F I G U R E 6 Overall survival in the subgroups according to a tertile of the total score (for example, white patients) was related to liver metastasis risk. Furthermore, studies have found that CEA is a prognostic factor and an ideal biomarker for CRC patients. [18][19][20] Conventional CEA monitoring during the postoperative follow-up was introduced to monitor relapse and distant metastases after CRC resection surgery. As nomograms manifested, CRC patients with positive CEA levels tended to have significantly worse OS rates and higher metastatic probabilities. In addition, left and right CRCs were indicated to have different embryological origins. 21 They have miscellaneous features, such as anatomical structure, morphological characteristics, function, and histochemical reactions. A former study associated malignant tumor location with CRC patient prognosis. 21 Patients with left CRC had a notably higher rate of lung and liver metastases but better prognosis than those with right CRC in view of OS, which was also supposed by this research. Parallel results showed that tumor size was an independent factor for OS in patients with colorectal adenocarcinoma of the ulcerative and infiltrative type in a previous study. 22 This study proved that larger tumors led to higher risks of lung, liver, and brain metastases, which triggered a worse prognosis.
In addition to age, CEA level, tumor site, and tumor size, preceding studies have also shown that histological differentiation, grade, LNH, N stage, and T stage were independent risk factors for CRC patients. 10 Histological differentiation was defined as a significant trait to evaluate the advantage of adjuvant chemotherapy in relevant research. 23 This nomogram verified that low histological differentiation, such as signet ring cell carcinoma (SRCC), was correlated with a worse prognosis. Low histological grade was deliberated among the unfavorable histopathological factors connected with the adverse clinical course of CRC. The results of this investigation showed that high histological grade was strongly suspected to give rise to lung, liver, and bone metastases, while only lung metastases appeared to maintain an association with SRCC. Moreover, NCCN guidelines recommended that the adequate staging of CRC demands at least 12 lymph nodes to be sampled. Previous research inferred that CRC patients with LNH less than 12 tended to have a shorter OS than those with LNH more than 12, 24 corroborating the results of the nomograms, which indicated that patients with few LNH tended to have a higher risk of lung, liver, and bone metastases. Some scholars suggested that patients with higher T and N stages suffered from a higher risk of liver metastases. 13 Higher T stage was associated with deeper infiltration, which might result in malignant tumor cells transferring into vessels. The nomograms developed in this study revealed that higher T and N stages were related to a higher risk of lung, liver, bone, and brain metastases and worse survival outcomes.
In this field, much work on the prognostic factors and metastatic sites of CRC has been reported recently. A few researchers reported that their nomogram scoring systems had exceptional capabilities in predicting the prognosis of CRC patients. Previous studies of prognostic prediction in CRC patients have been carried out. For instance, a combination of clinical risk factors and radiomics features emphasized potential advantages to the individualized preoperative prediction of lymph node metastasis in CRC patients, which was proposed to benefit patient OS. Sun et al argued that the fibrinogen and neutrophil-to-lymphocyte ratio (F-NLR score) is a promising predictor for disease relapse in rectal cancer patients. 25,26 The dissertations mentioned above were dedicated to predicting the preoperative or postoperative conditions of patients, and both might improve the prognosis of patients. However, quite a few studies have proposed examination methods that have a greater trauma or economic cost to patients. Other studies have not considered metastasis in combination with clinical information or could not predict metastases, which greatly impacts CRC patients' survival outcomes.
However, there are still some shortcomings in the present study. First, therapy information except for surgery, such as specific radiotherapy and chemotherapy therapeutics, was not available in the SEER database to be included into the analysis. Second, the SEER cohort lacks some factors such as detailed mode of presentation and major prognostic scores, which have been demonstrated to have prognostic ability. Third, the SEER database lacks 90% of biomarker expression states, such as KRAS, NRAS, and BRAF. Additional prospective data collection and the internalization of some other variables are encouraged to improve this model.

CONCLUSIONS
In summary, we developed new nomograms to predict the specific distant metastatic sites and OS probability of CRC patients. The simple and clear nomograms not only have good clinical application value, but also have enough discrimination and calibration ability, which could be used as a convenient tool for clinicians to evaluate the prognosis of individualized CRC patients and determine the treatment strategy.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
The Ethical Committee and Institutional Review Board of the Fudan University Shanghai Cancer Center reviewed and approved this study protocol.

AVAILABILITY OF DATA AND MATERIALS
The dataset used during the study are available from the corresponding author on a reasonable request.