Evaluating overall survival and competing risks of survival in patients with early‐stage breast cancer using a comprehensive nomogram

Abstract Background Patients with early‐stage breast cancer (BC) live long but have competing comorbidities. This study aimed to estimate the effect of cancer and other causes of death in patients with early‐stage BC and further quantify the survival differences. Materials and methods Data of patients diagnosed with BC between 2010 and 2016 were collected from the Surveillance, Epidemiology, and End Results database. The cumulative incidence function for breast cancer–specific mortality (BCSM) and other cause‐specific mortality (OCSM) was estimated, and the differences were tested using the Gray test. The nomogram for estimating 3‐, 4‐, and 5‐year overall survival (OS), breast cancer–specific survival, and other cause‐specific survival was established based on Cox regression analysis and Fine and Gray competing risk analysis. The discriminative ability, calibration, and precision of the nomogram were evaluated and compared using C statistics, calibration plots, and area under the receiver operating characteristic curve. Results A total of 196 304 eligible patients with early‐stage BC were identified in this study. Of these, 12 417 (6.3%) patients died: 5628 (45.3%) due to BC and 6789 (54.7%) due to other causes. Five validated variables were incorporated to develop the prognostic nomogram: age, grade, tumor size, subtype, and surgery of primary site (Figure 3). Age was a strong predictive factor, which was more obvious in OCSM. The effect of surgery was more prominent in BCSM. Increased tumor size was correlated with OS and BCSM and slightly correlated with OCSM. Grade and subtype differences were more predominant in BCSM than in OCSM. The established nomogram was well calibrated and displayed good discrimination. Conclusions We evaluate OS and competing risks of death in patients with early‐stage BC, establishing the first comprehensive prognostic nomogram.


| INTRODUCTION
Breast cancer (BC) is the most common malignant tumor in women and the main cause of cancer-specific death, with 268 600 estimated new cases and 41 700 estimated deaths in 2019 in the USA. 1 Presently, the prognosis of BC, especially early-stage BC, has been dramatically improved by multidisciplinary treatments, including radical resection, neo-/adjuvant chemotherapy, and hormone and targeted therapies. [2][3][4] In developed countries, early-stage BC has become the most frequently diagnosed invasive breast disease. However, in BC survivors, comorbidities, such as cardio-and cerebrovascular diseases, compete with BC as primary causes of death. Given the good prognosis of early-stage BC, the long-term benefit of treatment, particularly in the elderly population, depends on competing risks of death. Thus, considering the competing risks is necessary in the assessment of prognosis. Several published studies [5][6][7][8][9] have reported the prognosis of BC, but most of them only either paid more attention on overall survival (OS) or analyzed cancer-specific mortality using the traditional Cox regression model, which cannot necessarily reflect the effect on cumulative incidence. In the individualized treatment era, evaluating the OS is not far enough. It is important to differentiate cancer-specific and other cause-specific mortality (OCSM). When competing risks exist, the traditional Cox regression model may be inappropriate because, in this model, the competing events are regarded as censoring and cancer-specific mortalities may be overestimated. [10][11][12] Thus, in this situation, the Fine and Gray model 11,13 is recommended. Therefore, we evaluate OS and competing risks of death (BC related and other causes related) in patients with early-stage BC and build a comprehensive nomogram to provide the physician with a quantitative tool using a large population of early-stage BC.

| Patients
Data of patients with early-stage (stage I-II) BC were retrospectively extracted from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2016) using SEER*Stat version 8.3.4. We identified a total of 446 806 patients who were pathologically diagnosed with BC. The exclusion criteria were as follows:  Figure 1. Informed consent was not required because the SEER database does not contain personal information. Clinicopathological variable selection depended on clinical importance and predictors identified in previous studies, 6,8,9 including age, grade, tumor size, subtype, surgery to primary sites, and survival time. We classified age at diagnosis into seven groups: <60, 60-65, 66-70, 71-75, 76-80, 81-85, and >85 years. The tumor sizes were categorized into five groups (<1, <2, <3, <4, and ≥4 cm) for the OS analysis, six groups (<1, <2, <3, <4, <5, and ≥5 cm) for the breast cancer-specific mortality (BCSM) analysis, and four groups (<1, <2, <3, and ≥3 cm) for the OCSM analysis. Subsequently, the 196 304 patients with stage I-II BC were randomly divided into two groups at a ratio of 9:1, training cohort (N = 176 674) and validation cohort (N = 19 630), using random number method produced by runif function of stats R package. The training cohort was used to construct the nomogram, while the validation cohort was used for validation. There were no significant differences between the two groups (P > .05) ( Table 1).

| Statistical analysis
Demographic and clinical characteristics were summarized using descriptive statistics. Categorical variables were reported as whole numbers and proportions, and continuous variables were reported as medians with interquartile ranges (IQRs), unless indicated otherwise. The chi-square test and Fisher's exact test for categorical variables and Student's t test for continuous variables were performed to compare baseline characteristics.
OS was defined as the time from diagnosis to death by any cause. The Kaplan-Meier method was used to generate OS, and the log-rank test was used to examine the differences in OS. The associations between relevant clinical variables and OS were analyzed using the Cox regression model.
We used the cumulative incidence function (CIF) to describe cause-specific survival and Gray's test to analyze the differences. We classified cause of death as either BC related or other causes related. BCSM and OCSM were considered two competing events. The Fine and Gray competing risk analysis (based on the subdistribution hazard ratio [SHR]) 11,13,14 was used to predict the probabilities of the two competing mortality outcomes (BCSM and OCSM). The Fine and Gray model is a multivariable time-to-event model, which accounts for the fact that individuals can only have one of the two competing events. The model also accounts for censoring among those who did not have an event during the follow-up.
The independent risk factors identified in the multivariate analysis were incorporated into the nomogram to predict the probability of 3-, 4-, and 5-year OS, breast cancer-specific survival (BCSS), and other cause-specific survival (OCSS) in patients with early-stage BC using the rms and mstate packages in the R Project. 10,15 The ability and calibration of the nomogram were assessed by concordance index (C-index) and calibration curves (comparing the nomogram-predicted probability with the observed probability). 16,17 The calibration curves were used to reduce the overfit bias via a bootstrap method with 1000 resamples. 18 Furthermore, the precision of the 3-, 4-, and 5-year OS, BCSM, and OCSM was evaluated and compared using the area under the receiver operating characteristic curve (AUC). Higher C-index and AUC values show higher ability to distinguish patients from different survival outcomes. Finally, Kaplan-Meier curves were plotted for patients grouped by risks predicted from the nomogram to further assess calibration. 18 A two-tailed P-value < .05 was considered statistically significant. All analyses were conducted using the R software (version 3.4.3; R Foundation).

| Patient
A total of 196 304 patients with early-stage BC from 2010 to 2016 were included in the final analysis and randomly divided into two groups at a ratio of 9:1: training cohort (N = 176 674) and validation cohort (N = 19 630). The baseline characteristics of the two groups are presented in Table 1, and there was no significant difference between them (P > .05). The median age at diagnosis was 60 years (IQR, 51-70 years). In the entire population, nearly half of the patients (47.5%) were aged <60 years. Moderate differentiation (Grade II) (44.4%) accounted for the highest proportion, followed by poor differentiation (Grade III-IV) (30.3%), and good differentiation (Grade I) (25.3%). Small tumors prevailed in patients with early-stage BC. Regarding size, 62.7% of the tumors were
In the multivariable Cox regression analysis (

| Nomogram
Five validated variables were incorporated to develop the prognostic nomogram: age, grade, tumor size, subtype, and surgery at the primary site ( Figure 3). Thus, the probability of 3-, 4-, and 5-year OS, BCSS, and OCSS could be predicted by summing up the scores of each selected variable (higher total points, . Calibration plots presented high conformance between the nomogram-predicted and observed probabilities in both the training and validation cohorts (Figure 4).  The discriminatory capacity of the nomogram was evaluated by calculating the AUC values ( Figure 5). The AUC values for predicting 3-, 4-, and 5-year OS were 80.2%, 79.5%, and 78.7%, respectively. As for the prediction of the 3-, 4-, and 5-year BCSM, the AUC values were 83.0%, 81.7%, and 80.3%, respectively. Moreover, the AUC values were 81.3%, 80.8%, and 81.7%, respectively, for the 3-, 4-, and 5-year OCSM.
Based on the C-index and AUC values, the model predicting BCSM and OCSM using the Fine and Gray competing risk analysis had more precision than that of predicting OS.
Furthermore, to further evaluate the discrimination of the model, the validation cohort was stratified into three groups based on the predicted probability calculated from the nomogram: low-, middle-, and high-risk groups. Among the entire population, patients in the high-risk group had significantly lower OS rates and higher BCSM or OCSM rates than patients in the low-and middle-risk groups (5-year OS rate: 0.644 for high-risk group, 0.860 for middle-risk group and 0.958 for low-risk group; 5-year BCSM rate: 0.238 for high-risk group, 0.111 for middle-risk group and 0.024 for low-risk group; 5-year OCSM rate: 0.213 for high-risk group, 0.031 for middle-risk group and 0.010 for low-risk group) (P < .001) ( Figure 6).

| DISCUSSION
In the study, we analyzed the survival and mortality in patients with early-stage BC, discriminating the differences  To the best of our knowledge, this study was based on more than 190,000 patients from the SEER database, which contains the largest cohort to date. It is the first study to use the Fine and Gray competing risk analysis based on the proportional SHR to model the CIF. 13,19 Unlike previous nomograms, 20,21 providing the physician with a patient's probability of surviving the disease assuming no death from a competing cause, our nomogram is comprehensive, considering OCSM, and shows relatively good calibration and discrimination power with C-indices > 0.80 and AUC values of approximately 80%. Although the follow-up duration was insufficient, more than half of the deaths were attributed to causes other than primary BC. It is better to consider such competing risks when evaluating prognosis for decision-making and patient counseling.
Age was a strong predictive factor and more obvious in OCSM. That is, older patients had higher risk of OCSM. Chen et al 9 also revealed that elderly women exhibited worse OS but better BCSS than young women, although OCSM was not evaluated. These results may be due to higher frequencies of age-related comorbidities and less basic life support, leading to high OCSM. Therefore, in patients with early-stage BC, it is equally important to pay attention to the primary breast and age-related diseases. A healthy lifestyle that includes weight management, selfcare, and preventive strategies should be encouraged by physicians to prevent OCSM.
The far-reaching impact of surgery was observed, especially on BCSM. In our study, based on those who underwent surgery, regardless of the surgery type, patients who did not undergo surgery had significantly poorer prognosis. Almost 90% of women diagnosed with BC have early-stage disease and may be treated with breast-conserving surgery or mastectomy. 22,23 The long-term survival of women with early BC who were treated with breast-conserving surgery and postoperative radiotherapy was virtually identical to that in women who underwent radical mastectomy. 24 However, surgery itself may carry a series of risks and adverse effects, leading to the increase in OCSM rate.
Although this study presents a good predictive nomogram, there are still several limitations. First, due to the unavailable subtype information before 2010 in the SEER database, the follow-up (2010-2016) duration was short for early-stage BC. A longer follow-up duration may improve the precision and discrimination of our model. Second, the variable of comorbidity is lacking. SEER does not collect data on comorbid status, which worsens with age and affects patient survival. Instead, we consider age as a replacement of the comorbidity to compensate for the limitation. Finally, internal validation was used to evaluate the model. Although it demonstrated good accuracy, external validation based on other patient cohorts is still needed.

| CONCLUSIONS
We evaluated OS and competing risks of death in patients with early-stage BC based on the Fine and Gray competing risk analysis. This is the first study to develop a comprehensive nomogram predicting 3-, 4-, and 5-year OS, BCSS, and OCSS using a large population. Additionally, the well-performed nomogram may help answer patients' consultation questions and offer prognostic assessment for individuals. However, more studies are required for further external validation.