Optimizing bladder cancer locoregional failure risk stratification after radical cystectomy using SWOG 8710
John P. Christodouleas MD, MPH,
Department of Radiation Oncology, University of Pennsylvania, Philadelphia, Pennsylvania
Corresponding author: John P. Christodouleas, MD, MPH, Department of Radiation Oncology, Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Blvd, TRC-2 West, Philadelphia, PA 19104; Fax: (215) 349-5445; firstname.lastname@example.org
Clinical trials of radiation after radical cystectomy (RC) and chemotherapy for bladder cancer are in development, but inclusion and stratification factors have not been clearly established. In this study, the authors evaluated and refined a published risk stratification for locoregional failure (LF) by applying it to a multicenter patient cohort.
The original stratification, which was developed using a single-institution series, produced 3 subgroups with significantly different LF risk based on pathologic tumor (pT) classification and the number of lymph nodes identified. This model was then applied to patients in Southwest Oncology Group (SWOG) 8710, a randomized trial of RC with or without chemotherapy. LF was defined as any pelvic failure before or within 3 months of distant failure.
Patients in the development cohort and the SWOG cohort had significantly different baseline characteristics. The original risk model was not fully validated in the SWOG cohort, because lymph node yield was not as strongly associated with LF as in the development cohort. Regression analysis indicated that margin status could improve the model. A revised stratification using pT classification, margin status, and the number of lymph nodes identified produced 3 subgroups with significantly different LF risk in both cohorts: low risk (≤pT2), intermediate risk (≥pT3 with negative margins AND ≥10 lymph nodes identified), and high risk (≥pT3 with positive margins OR <10 lymph nodes identified) with 5-year LF rates of 8%, 20%, and 41%, respectively, in the SWOG cohort and 8%, 19%, and 41%, respectively, in the development cohort.
Patients with muscle-invasive bladder cancer who undergo radical cystectomy plus bilateral pelvic lymphadenopathy (RC) with or without the receipt of perioperative chemotherapy have an estimated 5-year overall survival rate of approximately 50%. Although considerable attention has been given to the problem of distant relapse after RC, approximately 33% of patients with ≥pT3 tumors develop a recurrence within the pelvis, either as isolated locoregional failures (LF) or cosynchronous with distant metastases. Several organizations are now considering clinical trials to assess the impact of radiation therapy (RT) after RC. However, criteria for the selection and stratification of patients most likely to benefit from adjuvant RT in these trials have not been clearly defined. A LF risk-stratification model derived from a single-institution experience has recently been published but not externally validated. The purpose of this study was to assess the validity of this LF stratification model within the Southwest Oncology Group (SWOG) 8710 database, a heterogeneous, multi-institutional cohort of patients who were randomized to undergo RC with or without neoadjuvant chemotherapy.[1, 2]
MATERIALS AND METHODS
Original Risk Stratification
The original LF risk-stratification model divided patients in the development cohort into 3 statistically distinct risk groups based on 2 variables: pathologic tumor classification at cystectomy (≤pT2 or ≥pT3) and the total number of benign or malignant lymph nodes identified in the RC pathology specimen (<10 or ≥10 benign or malignant lymph nodes). The model indicated that patients with ≤pT2 tumors were at low risk of LF, those with ≥pT3 tumors who had ≥10 lymph nodes identified were at intermediate risk of LF, and those with ≥pT3 tumors who had <10 lymph nodes identified were at high risk of LF.
The development cohort included 486 consecutive patients who underwent RC with or without perioperative chemotherapy at the Hospital of the University of Pennsylvania between 1990 and 2008. Of these, 44 patients were excluded because they lacked elements of urothelial carcinoma (37 patients) or had received radiation (7 patients), leaving 442 patients (91%) for analysis. Details of the evaluation, surgery, and pathologic review of these patients have been previously described. After surgery, patients were evaluated every 4 months for 2 years, every 6 months until year 5, and then annually with routine chest x-rays and biannual computed tomography (CT) scans or magnetic resonance imaging (MRI) of the abdomen and pelvis.
SWOG 8710 was a randomized trial that compared RC alone versus 3 cycles of neoadjuvant methotrexate, vinblastine, doxorubicin, and cisplatin (MVAC) followed by RC for patients with clinical T2 through T4a transitional cell carcinoma with or without squamous differentiation. In total, 317 patients were accrued between 1987 and 1998. Of these, 53 were excluded because they did not undergo RC (35 patients) or their records were not available for review (18 patients), leaving 264 patients (85%) for analysis. Study patients were operated on by 106 different surgeons at 109 different institutions. The workup, surgery, and pathologic review have been described previously in detail.[1, 2] The recommended evaluation after surgery was every 3 months for the first year, every 6 months for the second year, and yearly thereafter with chest x-rays. Abdominal or pelvic imaging was not required according to the protocol.
Endpoints and Statistical Analyses
he primary endpoint of this analysis was LF, but overall survival (OS) and isolated distant metastasis (DM) also were recorded. These endpoints were scored based on a review of original medical records at the Hospital of the University of Pennsylvania and case report forms and source documentation in the SWOG 8710 database. LF was defined as imaging evidence of recurrence in the pelvic soft tissues or lymph nodes below the aortic bifurcation before or within 3 months after DM. Recurrence within the inguinal lymph nodes was classified as DM. The time to LF, OS, and isolated DM were calculated from the date of surgery with censoring at the date of last follow-up. LF was analyzed using a cumulative incidence function in which isolated DM, death, and second primary malignancies (including ureteral or urethral malignancies) other than prostate or skin cancers were considered competing events. Chi-square tests were used to compare baseline patient characteristics between the development cohort and the SWOG cohort. Log-rank tests were used to compare OS. The goal of the external validation was to demonstrate that the original LF stratification produced significantly different subgroups when applied to the SWOG cohort. Fine and Gray regression was used to compare the cumulative incidences of LF or isolated DM between subgroups and to determine whether the explanatory power of the original risk stratification could be improved with additional covariates. Alternative risk-stratification rubrics were compared using Harrell c-indices and log likelihoods after excluding patients with unknown model parameters. The number of patients in each model was fixed by excluding those with unknown margin status and an unknown number of lymph nodes identified to ensure a fair comparison between models.
Differences Between the Development and SWOG Cohorts
The cohorts differed significantly with respect to age, pathologic tumor classification, lymph node positivity, surgical soft-tissue margins, the number of lymph nodes identified, and receipt of neoadjuvant and adjuvant chemotherapy (P < .01 for all comparisons) but not sex or histology (Table 1). There was a borderline significant difference with respect to OS (log-rank P = 0.05) with 5-year OS estimates of 42% (95% confidence interval [CI], 37%-47%) and 49% (95% CI, 43%-55%) for the development and SWOG cohorts, respectively. There were 80 and 40 LF events in the development and SWOG cohorts, respectively. There were no significant differences with respect to overall LF (Gray P = .13) or overall isolated DM (Gray P = .22). In the development and SWOG cohorts, the overall 5-year LF estimates were 18% (95% CI, 14%-22%) and 15% (95% CI, 11%-20%), respectively, and the 5-year isolated DM estimates were 17% (95% CI, 14%-21%) and 20% (95% CI, 16%-26%), respectively.
Table 1. Differences in Patient Characteristics Between the Development Cohort and the Southwest Oncology Group Cohort
Evaluating the Original Risk Stratification Within the SWOG Cohort
Applying the original risk stratification to the SWOG cohort, the 5-year LF estimates were 8%, 29%, and 36% for the low-risk, intermediate-risk, and high-risk groups, respectively (Fig. 1). The risk of LF differed significantly between the low-risk and intermediate-risk groups (subhazard ratio [SHR], 3.93; 95% CI, 1.87-8.26; P < .01) and between the low-risk and high-risk groups (SHR, 5.63; 95% CI, 2.64-11.98; P < .01). However, the original risk stratification was not fully validated in the SWOG cohort, because the risk of LF was not significantly different between the intermediate-risk and high-risk groups (SHR, 1.43; 95% CI, 0.64-3.17; P = .38) (Fig. 1). Fine and Gray regression was used to determine whether LF risk stratification in the SWOG cohort could be improved by the addition of 1 or more patient characteristics within the model. All characteristics that were significantly different between the 2 cohorts were tested (Table 1). Of these variables, only margin status was associated significantly with LF when controlling for the original risk stratification (Table 2).
Table 2. The Association of Characteristics in the Southwest Oncology Group Cohort With Locoregional Failure Controlling for the Original Risk Stratification
Abbreviations: CI, confidence interval; LF, locoregional failure; NA, not applicable (because the variable was included in the original risk stratification); NR, 5-year endpoint not reached; Ref, reference variable; SHR, subhazard ratio.
Univariate P values.
Adjusted P values.
Values indicate the number of benign or malignant lymph nodes identified.
Revising the LF Risk Stratification to Include Margin Status
Because margin status appeared to have a stronger association with LF than the number of lymph nodes identified in the SWOG cohort, we considered 2 modifications to the original risk-stratification model. The first alternative included 2 variables, pT classification and margin status, as follows: low risk, ≤pT2 tumor; intermediate risk, ≥pT3 tumor AND negative margins; and high risk, ≥pT3 tumor AND positive margins. The second alternative included 3 variables, pT classification, the number of lymph nodes identified, and margin status, as follows: low risk, ≤pT2 tumor; intermediate risk, ≥pT3 tumor with ≥10 lymph nodes identified AND negative margins; and high risk, ≥pT3 tumor with <10 lymph nodes identified OR positive margins). The second alternative was selected for further evaluation because it was associated with the highest Harrell c-index (0.709) and log-likelihood (−649) in a data set that combined the SWOG and development cohorts (Table 3).
Table 3. C-Indices for the Fine and Gray Regressions of 3 Risk Stratifications in the Southwest Oncology Group Cohort, the Development Cohort, and Both Cohorts
The number of patients in each model was fixed by excluding those with unknown margin status and those who had an unknown number of lymph nodes.
Values indicate the number of benign or malignant lymph nodes identified.
Low, ≤pT2; intermediate, ≥pT3 and ≥10 lymph nodesb; high, ≥pT3 and <10 lymph nodesb
Low, ≤pT2; intermediate, ≥pT3 and NM; high, ≥pT3 and PM
Low, ≤pT2; intermediate, ≥pT3 with NM and ≥10 lymph nodesb; high, ≥pT3 with PM or <10 lymph nodesb
Evaluating the Revised LF Risk Stratification
By using the revised risk stratification, the 5-year LF estimates were 8%, 20%, and 41% for the low-risk, intermediate-risk, and high-risk groups, respectively, in the SWOG cohort (Fig. 2A) and 8%, 19%, and 41% for the low-risk, intermediate-risk, and high-risk groups, respectively, in the development cohort (Fig. 2B). Within the SWOG cohort, the risk of LF was significantly different between the low-risk and intermediate-risk groups (SHR, 2.60; 95% CI, 1.01-6.65; P = .04) and between the low-risk and high-risk groups (SHR, 6.32; 95% CI, 3.20-12.49; P < .01). The risk of LF differed with borderline significance between the intermediate-risk and high-risk groups (SHR, 2.44; 95% CI, 0.96-6.17; P = .06). Within the development cohort, the risk of LF was significantly different between the low-risk and intermediate-risk groups (SHR, 2.13; 95% CI, 1.18-3.83; P = .01) and between the low-risk and high risk groups (SHR, 5.70; 95% CI, 3.38-9.62; P < .01). The risk of LF also differed significantly between the intermediate-risk and high-risk groups (SHR, 2.68; 95% CI, 1.55-4.63; P < .01).
To better characterize the behavior of the revised risk stratification, we evaluated the secondary endpoints of OS and isolated DM. Within each cohort, the OS of each risk group was significantly different from the OS for the other risk groups (log-rank P < .01 for all pair-wise comparisons). The 5-year OS estimates were 62%, 39%, and 7% for the low-risk, intermediate-risk, and high-risk groups, respectively, in the SWOG cohort (Fig. 3A) and 60%, 31%, and 10% for the low-risk, intermediate-risk, and high-risk groups, respectively, in the development cohort (Fig. 3B). In contrast, the 3 risk groups were not consistently stratified with respect to the risk of isolated DM in either the SWOG cohort or the development cohort. The 5-year isolated DM estimates were 15%, 30%, and 38% for the low-risk, intermediate-risk, and high-risk groups, respectively, in the SWOG cohort (Fig. 3C) and 14%, 22%, and 17% for the low-risk, intermediate-risk, and high-risk groups, respectively, in the development cohort (Fig. 3D).
A revised risk model that included surgical margin status, pathologic tumor classification, and the number of lymph nodes identified stratified LF outcomes in 2 different RC cohorts. The revised model identified subgroups that also had significantly different OS rates but not significantly different isolated DM rates.
LF Is an Emerging Target for Clinical Trials
Patients with medically operable, muscle-invasive urothelial carcinoma of the bladder have a poor prognosis, with a 5-year OS rate of approximately 50%. Because of the high rate of distant metastatic disease, much of the federally funded clinical research over the past several decades has focused on the use of systemic chemotherapy, and small but important gains have been confirmed with neoadjuvant chemotherapy.
The problem of LF has received less attention, largely because LF rates reported in large cystectomy series have underestimated the risk. Few series used routine postoperative surveillance imaging of the pelvis to detect LF, and most series reported LF only if it was the first and only site of recurrence.[4-8] More recently, LF after RC has been recognized increasingly as a significant problem in patients who present with locally advanced disease. LF as the first evidence of recurrent disease for patients who have extravesical disease or positive lymph nodes in the University of Ulm series, in which surveillance pelvic CT scanning was used, can be estimated from data in the article at approximately 31%. The University of Texas MD Anderson Cancer Center experience in clinically staged patients, most of whom received chemotherapy, revealed 5-year LF rates after RC of 29% and 44%, respectively for patients with clinical T3b (cT3b) and cT4 disease.[10, 11] In an international trial of neoadjuvant chemotherapy for muscle-invasive bladder cancer, the subset of patients who had RC had a locoregional recurrence rate of approximately 40%.
The hypothesis that reducing locoregional recurrences may improve disease-free survival by eliminating a potential source of DM is supported by several studies. In the report from The University of Texas MD Anderson Cancer Center, pelvic failures typically preceded the emergence of DM and very uncommonly occurred after the development of metastases, suggesting that reseeding of the pelvis from distant disease is unusual and that some distant sites may be seeded from locally recurrent disease. That same study indicated that locoregional recurrence was an independent variable predicting DM, a finding replicated by others. The observation that survival is enhanced with more extensive lymph node dissections, even in the absence of lymph node metastasis, suggests that the eradication of unrecognized microscopic lymph node disease in the pelvis may improve survival by decreasing distant as well as local failure. The study of adjuvant locoregional therapy is also encouraged by results from an older randomized trial of adjuvant radiation from the National Cancer Institute of Egypt. Although that trial mainly involved squamous cell carcinoma, urothelial carcinoma represented 20% of cases, and there appeared to be similar benefits to adjuvant RT regardless of histology. Postoperative radiation also was identified as an independent predictor of improved cancer-free survival in a small Italian study. Consequently, several institutions, including the University of Pennsylvania,[17, 18] Emory University (Ashesh Jani and Joseph Shelton, personal communication, June 22, 2013), and the Radiation Therapy Oncology Group (RTOG) (Libni Eapen, personal communication, March 9, 2013), are developing or have developed clinical trials of adjuvant RT for subsets of higher risk patients with urothelial carcinoma. These efforts would be improved by a rigorous understanding of which subsets of patients are most likely to benefit and how these patients should be stratified.
Margin Status Improves LF Risk Stratification
To our knowledge, this is the first study attempting to externally validate a model of LF risk after RC. The original model, which was developed in a large and heterogeneous single-institution database, included 2 variables: pT classification at cystectomy and the number of lymph nodes identified. This risk stratification was not fully validated in the SWOG 8710 cohort, because LF in the intermediate-risk and high-risk groups was not significantly different (Fig. 2). Regression analysis suggested that the inclusion of surgical margin status along with the original stratification variables could significantly improve LF risk modeling within the SWOG patient population. Within the SWOG cohort, margin status was a stronger independent predictor of LF than the number of lymph nodes identified. However, when margin status was initially considered as a risk factor in the development cohort, it was not an independent predictor of outcome in models that included the number of lymph nodes identified.
There are at least 2 reasons why the relative importance of margin status and the number of lymph nodes identified may have differed between the 2 groups. First, in the development cohort, in which a majority of patients had ≥10 lymph nodes identified (76%), a more limited lymph node dissection may have been used for patients whose operative goals were palliative. In these patients, achieving a negative margin also may have been less of a surgical priority. The co-occurrence of these 2 outcomes in a small subset of palliative patients may have exacerbated the problem of collinearity in the regression analysis and obscured the independent association of margin status and LF in the development cohort. Information on surgical intent was not available in the development database, but it likely included some palliative patients, because the database contained all of the RCs performed on nonmetastatic patients at a single institution over an 18-year period. In contrast, the SWOG cohort explicitly did not include patients who were treated with palliative intent, thus minimizing the potential of a confounded relation between margin status and the number of lymph nodes identified.
Second, variability in the anatomic extent of lymphadenectomy and the way lymph nodes were pathologically assessed also may have influenced the relative importance of lymph node yield and margin status. There were >100 institutions represented in the SWOG data set. Variability in the pelvic lymph node dissection and the approach to counting lymph nodes across these institutions would tend to decrease this outcome's explanatory power. In the development data set, however, all patients were treated and evaluated at a single institution by subspecialized urologists and pathologists according to standardized protocols, which perhaps better preserved a correlation between the number of lymph nodes identified and LF.
The revised LF model identifies subgroups that have significantly different OS, confirming its clinical relevance. It is noteworthy that these subgroups do not have significantly different rates of isolated DM, which suggests that the stratification is not simply a marker of metastatic potential and provides information complementary to existing models that predict any recurrence risk or disease-free survival.
Notably absent from the risk-stratification model is an association between receipt of chemotherapy and LF. In the SWOG data set, in which patients were randomized to receive neoadjuvant MVAC, chemotherapy was not significantly associated with LF on univariate analysis, and adjusting for other predictors of LF decreased the likelihood of any potential correlation (Table 2). It is possible that patients who are at risk for LF may simply have too great a burden of disease within the pelvis to be meaningfully impacted by the chemotherapy regimens used for these patients.
Another interesting finding is the absence of pathologic lymph node status as a predictor of LF. This variable has been used to select patients for adjuvant RT in the past,[15, 20] but the inclusion of pathologic lymph node status failed to improve our model in either the development cohort or the SWOG cohort, suggesting that positive lymph node status as a predictor of pelvic failure is less significant than pathologic tumor classification, margin status, and the extent of lymph node dissection, probably because the competing risk of early DM in lymph node-positive patients is so high.
This study has several important limitations. First, because the revised LF risk rubric was developed using both cohorts, external validation in additional data sets is warranted. Moreover, both data sets used in this study included only patients who received treatment in the United States, so it is important to validate the model in a regionally distinct cohort in which surgical techniques and pathologic assessment protocols may differ. Second, although this study was completed in part to inform the design of clinical trials, the LF estimates reported here, and in any retrospective cohort study, may be higher than the LF estimates in the observation arm of a prospective study of adjuvant therapy. A proportion of these patients (especially those in the high-risk group) develop LF rapidly after RC (Fig. 1) and are unlikely to be eligible long enough to be enrolled in a prospective protocol. This effect should be accounted for when powering trials.
A revised risk model that included surgical margin status, pathologic tumor classification, and the number of lymph nodes identified stratified LF outcomes in 2 significantly different RC cohorts. This model may represent an important step toward developing rigorous clinical trials of adjuvant locoregional therapy for bladder cancer.
Mr. Tucker reports grants from Howard Hughes Medical Institute via Swarthmore College during the conduct of the study.
CONFLICT OF INTEREST DISCLOSURES
Dr. Christodouleas reports employee status at Elekta, AB.