Prediction of liver and lung metastases in patients with early‐onset colorectal cancer by nomograms based on heterogeneous and homogenous risk factors

Abstract Background Identifying the risk factors for distant metastasis in early‐onset colorectal cancer (EOCRC) is crucial for elucidating its etiology and facilitating preventive treatment. This study aims to characterize the variability in EOCRC incidence and discern both heterogeneous and homogeneous risk factors associated with synchronous liver, lung, and hepato‐lung metastases. Methods This study included patients with EOCRC enrolled in the SEER database between 2010 and 2015 and divided patients into three groups by synchronous liver, lung, and hepato‐lung metastases. Each group of patients with different metastasis types was randomly assigned to the development and validation cohort in a ratio of 7:3. Logistic regression was used to analyze the heterogeneous and homogenous risk factors for synchronous liver, lung, and hepato‐lung metastases in the development cohort of patients. Nomograms were built to calculate the risk of metastasis, and the receiver operating characteristic (ROC) curve and calibration curve were used to quantitatively evaluate their performance. Results A total of 16,336 eligible patients with EOCRC were included in this study, of which 17.90% (2924/16,336) had distant metastases. The overall incidences of synchronous liver, lung, and hepato‐lung metastases were 11.90% (1921/16,146), 2.42% (390/16,126), and 1.50% (241/16,108), respectively. Positive CEA values before treatment, increased lymphatic metastases, and deeper invasion of intestinal wall were positively correlated with three distant types of metastases. On the contrary, the correlation of age, ethnicity, location of primary tumor, and histologic grade among the three types was inconsistent. The ROC curve and calibration curve proved to have fine performance in predicting distant metastases of EOCRC. Conclusions There are significant differences in the incidence of distant metastases in EOCRC, and related risk factors are heterogeneous and homogenous. Although limited risk factors were incorporated in this study, the established nomograms indicated good predictive performance.


| INTRODUCTION
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide. 1In the past few decades, there has been a significant increase in cases of early-onset colorectal cancer (EOCRC) in the United States and other high-income countries.EOCRC is defined as occurring in patients younger than 50 years old. 2,3This trend sharply contrasts with the steady decline in incidence and related deaths from late-onset CRC in the past two decades in America and other high-income countries. 4he decrease in late-onset CRC incidence and related mortality is primarily attributed to screening, followed by surveillance and treatment. 2The incidence of EOCRC has increased by 45% in the past 30 years, with a 1.3% increase in mortality each year since 2008.The US Preventive Services Task Force and the American Cancer Society have recommended lowering the screening initiation age to 45 years.However, patients younger than 50 years old are more likely to be uninsured and have lower compliance with screening, even if they have a family history of CRC. 5 Additionally, half of EOCRC patients are younger than 45 years old and therefore may not participate in screening.The increase in EOCRC has been accompanied by a decrease in late-onset CRC, with the median age at diagnosis dropping from 72 years in the early 21st century to 66 years currently. 1According to estimates, in the next 10 years, 10%-12% of colon cancer and 25% of rectal cancer will be diagnosed in individuals under the age of 50. 2 Tumor metastasis is the main cause of poor prognosis in CRC patients.Previous studies have indicated that patients with metastatic CRC have a 5-year survival rate of only 6%, while the 5-year survival rate for patients with localized CRC is 90%. 6Early-onset colorectal cancer patients are more likely to experience delayed diagnosis and lack awareness of warning signs and symptoms compared to older patients.Younger CRC patients are more often diagnosed at advanced stages, which is associated with increased mortality. 7,8he most common metastatic organ of colorectal cancer is the liver, followed by the lung. 9Advances in the treatment of metastatic disease in recent decades, including improvements in surgical techniques, development of targeted therapies, and progress in liver metastasis treatment, have significantly improved the survival of patients with distant metastases.Early detection of distant metastasis is crucial for optimizing management and treatment, improving quality of life, and increasing the 5-year relative survival rate for patients with firsttime diagnosed colorectal cancer.This holds significant clinical significance.
Imaging examinations, such as computed tomography (CT), positron emission tomography/CT (PET/CT), magnetic resonance imaging (MRI), and laboratory tests including serum tumor markers hold significant diagnostic value for detecting metastasis in CRC patients.CT serves as the primary imaging modality for evaluating distant metastasis in CRC patients.However, its sensitivity for detecting colorectal liver metastases ≥1 cm in diameter is only 65%. 10 Despite advancements in various auxiliary examinations that have improved the prognosis of colorectal cancer patients, there remains a significant number of cases where distant metastasis is detected late.Therefore, identifying independent risk factors for distant metastasis in patients with colorectal cancer can lead to the early identification of high-risk individuals.Most previous studies have primarily focused on identifying risk factors for distant metastasis in patients with colorectal cancer across all age groups, while less attention has been given to investigating the risk factors and prognostic variables specific to EOCRC patients, particularly those associated with heterogeneous and homogeneous types of distant metastasis. 11The prognosis of CRC patients varies based on diverse clinical and pathological factors, particularly in individuals presenting with distant metastasis.However, there is currently a relative lack of research on the incidence of synchronous liver metastases, lung metastases, and hepato-lung metastases in EOCRC, and the results of these studies remain controversial. 12,13Overall, there have been few systematic studies investigating the heterogeneity and homogeneity of risk factors for synchronous liver metastases, lung metastases, and hepato-lung metastases in EOCRC patients and establishing risk prediction models based on this information.This means that the disparity and probability of organ-specific metastases cannot be evaluated in EOCRC patients.Nomogram is a convenient, intuitive, and visual risk prediction tool Although limited risk factors were incorporated in this study, the established nomograms indicated good predictive performance.

K E Y W O R D S
early-onset colorectal cancer, incidence, metastases, nomogram, SEER that quantifies risk by integrating and validating several independent risk factors and have shown unique advantages in multiple studies. 14,15herefore, in this study, a large cohort study utilizing data from the Surveillance, Epidemiology, and End Results (SEER) database is used to characterize the incidence and risk factor differences for synchronous liver, lung, and hepato-lung metastases in EOCRC patients.Nomograms are developed to assist clinicians in predicting the risk of different distant organ metastases.Early identification of metastasis risk factors can help in developing appropriate medical strategies for EOCRC patients and providing targeted treatment.

| Population
The data used for population-based research came from the SEER database, which is an open public database of the National Cancer Institute (NCI).This study selected patients with EOCRC registered in the SEER database from 2010 to 2015.The database includes demographic and pathological characteristics of patients, as well as information on distant metastasis included at the time of the first diagnosis of colorectal cancer, indicating that all distant metastasis cases were synchronous.This study included patients diagnosed with EOCRC between 2010 and 2015, as well as patients with synchronous liver, lung, and hepato-lung metastases.Cases diagnosed at autopsy or via death certificates, with the age of < 18 or ≥50, pathologically diagnosed as in situ carcinoma or non-pathological diagnosis, or with an unknown location of primary tumor were not included in subsequent analyses.Patients without information on distant metastases were also excluded.The final study sample included three groups, namely the liver metastases group (N = 16,146), the lung metastases group (N = 16,126), and the hepato-lung metastases group (N = 16,108).Each group of patients was randomly divided into a development cohort and a validation cohort in a ratio of 7:3.The number of cases in each group after allocation was as follows: liver metastases group (development cohort: N = 11,302; validation cohort: N = 4844), lung metastases group (development cohort: N = 11,288; validation cohort: N = 4838), and hepato-lung metastases group (development cohort: N = 11,276; validation cohort: N = 4832).The development cohort was used to determine independent risk factors and build models, while the validation cohort was used for internal validation of the models.A case list was generated using SEER*Stat version 8.4.1.

| Statistical analysis
Categorical data were presented as numbers and percentages (N, %), and quantitative data were presented as means ± standard deviations (SD).Chi-square tests were used for comparisons of categorical variables.Univariate and multivariate logistic regression models were applied to identify the risk factors related to distant metastasis of EOCRC.Factors with statistically significant differences in the univariate analysis were included in the multivariate analysis.According to the results of multivariate logistic analysis, the intersection of independent risk factors for different types of metastases was identified to assess heterogeneity or homogeneity, and Venn diagrams were used for visualization.We developed predictive diagrams for synchronous liver, lung, and hepato-lung metastases in EOCRC.The calibration curves (with 1000 bootstrap samples), receiver operating characteristic (ROC) curves, and the area under the curves (AUC) were used to evaluate the predictive efficacy of the nomograms.Statistical analyses were conducted using GraphPad Prism version 9.0 and SPSS version 26.0.The statistical significance level was set at a two-tailed p-value of less than 0.05.Venn diagrams were generated using Figdraw.The "rms" and "pROC" packages in R version 4.2.1 were used to draw the nomograms and ROC curves, respectively.

| Clinical and demographic characteristics
A total of 16,363 eligible patients diagnosed with EOCRC between 2010 and 2015 were extracted from the database.The study excluded patients with unknown information regarding distant metastases, resulting in a final sample of 16,336 patients with or without distant metastases.Among these patients, 1921 had liver metastasis only, 390 had lung metastases only, and 241 had both liver and lung metastasis (Figure 1).The average age of the patients was 41.72 ± 6.79 years (range 18-49).51.23% (N = 8369) were male, and 54.9% (N = 8973) were married.The majority of patients were white (73.9%,N = 12,073).The rectum (28.05%,N = 4583) was the most common site of EOCRC.The most prevalent histological type observed was adenocarcinoma, accounting for 60.13% (N = 9823), followed by mucinous adenocarcinoma at 7.24% (N = 1182).The majority of EOCRC patients were classified as pT3 (48.73%,N = 7961) and pN0 (50.29%,N = 8216).60.54% of patients (N = 9890) were classified as grade II: moderately differentiated (Table 1).

| Incidence of synchronous distant metastasis in EOCRC
The proportion of patients with distant metastasis in EOCRC was 17.90% (2924/16,336), with liver metastases, lung metastases, and hepato-lung metastases occurring in 11.90% (1921/16,146), 2.42% (390/16126), and 1.50% (241/16,108) of cases, respectively.These differences were statistically significant (p < 0.001; χ 2 = 2140.96).The incidence of distant metastasis clearly increased with age, with the highest incidence observed in patients aged 40-49 (70.67%,N = 11,545).The incidence of distant metastasis varied by sex and location of EOCRC.The overall incidence of distant metastasis was lower in females compared to males (8.83% vs. 9.07%).The highest incidence of liver metastasis was observed in the sigmoid colon (3.49%), followed by the rectum (1.93%) and the rectosigmoid junction (1.36%).The highest incidence of lung metastases was observed in the sigmoid colon (0.63%).Among different sites and types of metastases in EOCRC, the sigmoid colon had the highest incidence, followed by the rectum and the rectosigmoid junction, with the appendix cancer having the lowest incidence.The incidence of right colon cancer was higher than that of left colon cancer (4.68% vs. 3.44%, 0.84% vs. 0.57%, and 0.52% vs. 0.40%).In general, the incidence of liver metastasis was highest in EOCRC, but varied by site of occurrence (Figure B).

| Risk factors for synchronous distant metastasis in EOCRC
Univariate analysis revealed that age, ethnicity, sex, marital status, location of primary tumor, histological grade, AJCC pT stage, histological type, lymph node metastases (AJCC pN stage), positive CEA value before treatment, T A B L E 1 Baseline demographic and related clinical characteristics of EOCRC patients.T A B L E 1 (Continued) and primary tumor size were all associated with distant metastasis.The multivariate logistic regression analysis revealed a positive association between the occurrence of distant metastases and several factors, including older age, location in the right/left colon, poor histological grade, mucinous adenocarcinoma, AJCC pT stage, lymph node metastasis, and positive CEA value before treatment (Table 2).Risk factors for specific organ metastasis in EOCRC showed both heterogeneity and homogeneity.AJCC pT stage, lymph node metastasis, and positive CEA value before treatment exhibited a positive correlation with the occurrence of different distant metastasis types in EOCRC patients.Older age, left/right colon, and poor histological differentiation were positively associated with liver metastasis.Poor histological differentiation, Asian-Pacific Islander (API), and African American exhibited a positive correlation with the occurrence of lung metastases.Furthermore, the presence of hepato-lung metastases demonstrated a positive association with older age, left colon location, and African American (Figure 3).The risk factors for metastasis at different sites are presented in Tables S1-S3.

| DISCUSSION
Previous studies have reported an approximate incidence of distant metastasis in EOCRC at 19.9%. 9However, limited research exists on the concurrent occurrence of liver metastases, lung metastases, and hepato-lung metastases in EOCRC.Inconsistencies in results regarding the same metastatic site may arise due to variations in sample sizes across studies.To our knowledge, this study represents the largest investigation into the incidence of simultaneous liver metastases, lung metastases, and hepato-lung metastases in EOCRC.Our findings indicate that the liver is the most frequently affected organ for metastasis among EOCRC patients.Notably, there are significant differences in clinical and pathological factors between liver, lung, and hepato-lung metastases in EOCRC.Therefore, it is imperative to identify independent risk factors for distant organ-specific metastasis. 16The identification of such risk factors can facilitate personalized treatment strategies and improve prognosis while also being cost-effective. 17revious studies have demonstrated that distinct histological subtypes of the same tumor exhibit varying rates of metastasis in different organs. 18,19The disparate incidences of liver, lung, and hepato-lung metastases observed may partially reflect the heterogeneity and homogeneity of distant metastasis in EOCRC.This study reveals both heterogeneity and homogeneity among factors associated with distant metastasis at different sites in EOCRC.The three types of metastases (liver, lung, and hepato-lung metastases) were positively correlated with lymph node metastases, AJCC pT stage, and positive CEA value before treatment.However, the heterogeneity risk factors found in this study are not completely consistent with previous research results.For example, we found that age 40-49, right/left colon location, and histological grade were associated with liver metastases.API and African American, as well as histological grade, were associated with lung metastases.We found that age 40-49, left colon location, and African American, were associated with hepato-lung metastases in EOCRC, which is different from the heterogeneity and homogeneity risk factors for distant metastases of CRC in previous studies. 20The heterogeneity of risk factors observed in our study may be partially attributed to  in sample size, as we included a larger cohort of 16,336 EOCRC patients compared to previous studies.To the best of our knowledge, this is the first study to elucidate organ-specific heterogeneity risk factors associated with distant metastasis in EOCRC.These findings have potential implications for early detection, personalized treatment strategies, and long-term prognosis improvement among EOCRC patients.The pathophysiological and molecular biological mechanisms underlying the deeper risk factors for liver metastasis, lung metastasis, or other organ metastases in EOCRC remain elusive.For instance, our study findings suggest that distinct tumor locations exhibit variations in distant metastases within EOCRC. 21,22n this investigation, the sigmoid colon exhibited the largest number of patients with distant metastases, followed by the rectum, cecum, rectosigmoid junction, ascending colon, appendix, descending colon, transverse colon, splenic flexure, and hepatic flexure.The risk of liver metastases was found to be higher in the right colon compared to the left colon and rectum in this study.4][25] A previous study elucidated the underlying molecular mechanisms distinguishing tumor locations in CRC, revealing that patients with right-sided primary metastatic disease exhibit a higher mutation burden and enrichment of multiple mutation sites. 26In contrast, left-sided tumors display distinct characteristics including (1) amplification enrichment in receptor tyrosine kinase signaling genes; (2) absence of mutations or copy number variations in cell division-associated genes; (3) mutations in APC, NRAS, and TP53 genes; and (4) potential susceptibility to fluctuations in the gut microbiome. 27These findings suggest that patients with left-sided and right-sided colorectal cancer possess unique molecular pathways contributing to metastasis.A higher proportion of EOCRC appears to have a hereditary component compared to CRC in older patients. 28n a study examining stable microsatellite DNA in EOCRC, the proportion of microsatellite and chromosome stable (MACS) was significantly higher compared to late-onset CRC (64% vs. 13%, p = 0.005). 29In another study, the miR-31-5p/Dystrophin (DMD) axis was identified as a specific key regulatory pathway, and DMD expression showed close associations with TNM stage and lymph node metastasis. 30However, there are few reports on the molecular mechanisms of metastasis in different tumor locations in EOCRC, and further exploration is needed. 29,31Although studies have indicated differences in distant metastases in EOCRC based on the primary tumor location, the reasons for these differences are still unclear.In addition to the primary tumor location, other factors such as demographic factors and clinical pathological factors also show significant differences in the occurrence and development of colorectal cancer.The pathological, physiological, and molecular biological differences in the development of distant metastasis in different risk factors of EOCRC also need to be further explored in the future.
This study summarized the heterogeneity and homogeneity risk factors for distant metastasis in EOCRC, which have not been comprehensively studied in previous studies.The aforementioned factors of homogeneity and heterogeneity may contribute to the monitoring of various distant metastases in patients with EOCRC.To assist clinicians in identifying high-risk patients with EOCRC, three prediction nomograms were developed based on risk factors associated with distant metastasis.Internal validation results demonstrated favorable predictive performance of the algorithm.Routine screening and early diagnosis of clinical metastasis often necessitate additional technical and equipment support; however, morphology-based nomograms utilizing heterogeneity and homogeneity factors may offer a more cost-effective approach.
Nomograms possess distinct advantages in the prediction of distant metastasis in patients with colorectal cancer.Previous studies have demonstrated that nomograms enable timely and informed treatment decisions for patients with CRC, thereby reducing the likelihood of emergency surgery and enhancing patient survival rates. 14,15Therefore, we recommend employing this prediction model as the initial screening method for EOCRC patients, followed by its integration with other auxiliary examinations such as PET/CT to identify high-risk populations.Moreover, for patients with EOCRC who do not exhibit distant metastasis through laboratory or auxiliary examinations, it may be necessary to shorten the interval of serological markers or imaging tests for high-risk populations in order to promptly detect tumor metastasis and develop personalized treatment.
However, the current study has certain limitations.Firstly, some risk factors that have been proven to be closely related to CRC, such as dietary habits, family history, and history of digestive system diseases, were not included in this study because the SEER database does not contain these risk factors. 32,33There have been limited studies on the risk factors for distant metastases in EOCRC, and the risk factors included in this study are limited.Furthermore, it should be noted that the database utilized in this study solely pertains to the American population, thus limiting the applicability of our predictive models to other regions and countries.Prior to implementing these models in a specific country, validation must be conducted in diverse populations.Finally, although our results demonstrate favorable discrimination and calibration capabilities for the nomogram, caution is advised when interpreting these findings due to the absence of external data validation for assessing generalizability.Therefore, additional external validation through large prospective cohort studies across various populations is warranted.

| CONCLUSION
In this study, distant metastasis was observed in 17.90% of patients with EOCRC, with synchronous liver, lung, and hepato-lung metastases occurring at incidence rates of 11.90%, 2.42%, and 1.50%, respectively.The occurrence of distant metastases in EOCRC varies based on clinical and pathological factors such as location of primary tumor, histological differentiation, and ethnicity of patient.The three types of distant metastases in EOCRC were positively correlated with positive CEA value before treatment, increased lymph node metastases, and higher AJCC pT stage.In addition, there are heterogeneity factors between different types of metastases.Utilizing these factors, nomograms were developed to predict distant metastasis in EOCRC patients, demonstrating good discrimination and calibration capabilities during internal validation.These findings have the potential to facilitate accurate predictions and personalized treatment recommendations for individuals with EOCRC.

F I G U R E 2
The distribution and trend of distant metastases in EOCRC patients.Classified by different distant metastases by different tumor location.

3 F I G U R E 4
Heterogeneous and homogenous related risk factors of different types of distant metastases in patients with EOCRC.Risk factors of more lymphatic metastases, positive CEA value before treatment and higher AJCC pT stage were homogenous related risk factors for the three types of distant metastases.The risk factors listed in nonintersections show the specific factors related to each type of distant metastases.Nomograms for predicting synchronous liver metastasis (A), lung metastasis (B), hepato-lung metastases (C) in EOCRC patients.

F I G U R E 5
The calibration curves and ROC curves for evaluating the calibration and discrimination of the nomograms of development cohort in predicting synchronous liver metastasis (A, D), lung metastasis (B, E), and hepato-lung metastases (C, F).

F I G U R E 6
The calibration curve and ROC curve for evaluating the calibration and discrimination of the nomograms of validation cohort in predicting synchronous liver metastasis (A, D), lung metastasis (B, E), and hepato-lung metastases (C, F).