Principal component analysis based pre-cystectomy model to predict pathological stage in patients with clinical organ-confined bladder cancer
Siamak Daneshmand, USC Institute of Urology, Norris Comprehensive Cancer Center, 1441 Eastlake Avenue, Suite 7416, Los Angeles, CA 90089-2211, USA. e-mail: email@example.com
What's known on the subject? and What does the study add?
Clinical stage is an integral part of outlining treatment strategy and counselling patients with bladder cancer. The discordance rate between clinical stage and pathological stage, however, is currently high with more than 40% of patients with presumed organ-confined disease being upstaged after surgery. It accounts for the major part of overall pathological upstaging in patients with bladder cancer and is strongly associated with poor prognosis. There is an absolute need for additional methods to improve the accuracy of pre-surgical staging to predict post-surgical pathological stage of invasive bladder cancer for more accurate risk stratification.
This study presents an internally validated pre-cystectomy principal component analysis staging model involving clinicopathological information on a large cohort of patients to predict post-surgical stage of bladder cancer. This model greatly reduces pathological upstaging to extravesical disease compared with clinical staging alone.
- • To develop a model that integrates the clinical and pathological information prior to radical cystectomy to increase the accuracy of current clinical stage in prediction of pathological stage in patients with bladder cancer (BC) using a modelling approach called principal component analysis (PCA).
PATIENTS AND METHODS
- • In a single-centre retrospective study, demographic and clinicopathological information of 1186 patients with clinically organ-confined (OC) BC was reviewed.
- • Putative predictors of post-cystectomy pathological stage were identified using a stepwise logistic regression model.
- • Patients were randomly divided into training data set (two-thirds of the study population, 790 patients) and test data set (one-third of the study population, 396 patients).
- • The PCA method was used to develop the model in the training data set and the cut-off point (PCA score) to differentiate pathological OC disease from extravesical disease was determined. The model was then applied to the test data set without recalculation.
- • In all, 685 patients (57.7%) had pathological OC disease. Age, clinical stage, number of intravesical treatments, lymphovascular invasion, multiplicity of tumours, hydronephrosis and palpable mass were incorporated into the PCA model as predictors of pathological stage.
- • The sensitivity and specificity of the PCA model in the test data set were 62.8% (95% CI 55.6%–68.1%) and 68.9% (95% CI 60.8%–76.0%), respectively. The positive and negative predictive values were 75.8% (95% CI 69.0%–81.6%) and 51.5% (95% CI 44.4%–58.5%), respectively.
- • The pre-cystectomy PCA model improved the ability to differentiate OC disease from extravesical BC and especially decreased the under-staging rate.
- • The pre-cystectomy PCA model represented a user-friendly staging aid without the need for sophisticated statistical interpretation.
principal component analysis
number of intravesical therapies
carcinoma in situ
positive predictive value
negative predictive value
lymph node positive
Bladder cancer (BC) is one of the most common cancers of the genitourinary tract with 73 510 newly diagnosed patients expected in 2012 . Radical cystectomy (RC) is considered the mainstay of treatment for patients with treatment-refractory non-muscle-invasive and muscle-invasive BC [2,3]. Nevertheless, the course of the disease is significantly different between organ-confined (OC) vs extravesical (EV) disease. The 5-year recurrence-free survival for OC disease is 89% and it drops down sharply to 62%, 50% and 35% for pT3b, pT4 and node positive tumours; respectively . It has been shown that the subset of patients with EV disease would benefit most from neoadjuvant chemotherapy (NAC) . Conventional clinical staging of BC using bimanual examination, transurethral resection (TUR) of the tumour and CT scan with contrast agent, however, is not accurate in differentiating OC tumour from EV disease. The reported rate of post-surgical upstaging to EV disease is as high as 43%  and there is a need for additional methods to improve the accuracy of clinical staging.
Decision aids, such as risk groupings, probability tables and nomograms, have greatly improved our ability to predict endpoints and provide patients with the data they require to make informed medical decisions . The nomogram has been the most popular analytical model to predict recurrence, mortality and survival rates in BC [8–10]. Karakiewicz et al.  developed two nomograms to predict post-surgical T and N stages and showed an improvement, albeit suboptimal, in the accuracy of clinical stage to predict pathological stage. Principal component analysis (PCA) is a non-parametric statistical method of extracting relevant information from large data sets involving multiple variables . It is a well established analytical method for prognostic stratification of patients with prostate cancer . It is possible to condense several correlated variables into a single composite parameter by this method which facilitates prediction of dichotomous endpoints.
This study evaluates the feasibility of the PCA method to develop a pre-cystectomy staging model incorporating demographic, clinical and biopsy-driven information from a large cohort of patients with clinical OC BC to predict post-surgical pathological stage.
PATIENTS AND METHODS
In a single-centre retrospective study, patients with clinical lymph node negative OC BC (clinical (c) ≤T2bN0) and histology of urothelial carcinoma who underwent RC with intent to cure at the Institute of Urology, University of Southern California, between 1990 and 2008 were identified through our BC database. Demographics and clinical information including age, gender, clinical stage, number of intravesical therapies (IVnum), time between first diagnosis of muscle-invasive BC and RC, delivery of NAC, hydronephrosis and palpable mass were collected. In addition, TUR-driven information including grade, carcinoma in situ (CIS), lymphovascular invasion (LVI) and the number of tumours was recorded. Post-cystectomy pathological stage was converted to a dichotomous parameter including OC disease (≤pT2N0) vs EV disease (≥pT3 and/or pTanyN1–3) in all patients. Patients with a palpable mass fixed to surrounding structures were considered clinical T4 stage and were excluded from the analysis. To generate the PCA model, logistic regression analysis was initially performed to select the putative predictors of pathological stage for the final PCA model using all above-mentioned variables as covariates and post-surgical pathological stage as the endpoint variable. Patients were then randomly assigned into the training data set (two-thirds of the study population) to generate the PCA model and the test data set (one-third of the study population) to internally validate the model. PCA was run on the training data set in which putative predictors of pathological stage were assigned different weights in proportion to their relative contribution to the overall variance of the training data set (i.e. the larger the contribution of a particular variable to the overall variance, the higher its assigned weight) and the following formula was generated to calculate the cut-off point, termed the PCA score, in order to discriminate pathological OC from pathological EV disease:
in which P1 … Pn are the variables incorporated into the model and W1 … Wn are the weights assigned by the algorithm. Values below the PCA score indicated pathological OC disease and values equal to or above that score specified pathological EV disease. The formula was then internally validated in the test data set. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for both training and test data sets were determined using a web calculator and the C-index value, derived from the PCA model, represented the predictive accuracy. In order to render the calculated predictive accuracy more representative of the real clinical setting, we set 65% as the lowest clinically accepted value for sensitivity of the model at the time of generating the C-index value. The institutional review board at our institute approved the study protocol and all patients signed and dated a consent form. Statistical analysis was done using SAS (SAS Institute, Cary, NC, USA) and P< 0.05 was considered statistically significant.
A total of 1186 patients were included in the study. The characteristics of the study population are summarized in Table 1. There was no significant difference in clinicodemographic characteristics and TUR-related features between patients in the training data set (790 patients) and in the test data set (396 patients). On surgical pathology, 685 patients (57.7%) had OC disease and 501 patients (42.1%) had EV disease including 237 patients (19.9%) with ≥pT3aN0 and 264 patients (22.2%) with nodal disease (Table 2). Age, clinical stage, IVnum, LVI, multiple tumours, hydronephrosis and palpable mass as putative independent predictors of pathological stage (Table 3) were incorporated into the PCA model and the following formula was generated:
where clinical stage is cT0/cTa/cTis = 0, cT1 = 1, cT2 = 2; age is chronological age in years rounded off to the closest discrete value; for LVI, absent = 0, present = 1; for hydronephrosis, absent = 0, present = 1; for tumour count, single = 0, multiple = 1; and for palpable mass, absent = 0, present = 1.
Table 1. Baseline characteristics of study population
|Age||66.9 ± 10.5†||67.1 ± 10.5||66.5 ± 10.6||0.5|
|IV treatment|| || || || |
| Yes||376 (31.7)||235 (29.7)||141 (35.6)||0.07|
| No||810 (68.3)||555 (70.3)||255 (64.4)|
|IVnum||0.6 ± 2.2||0.7 ± 2.5||0.6 ± 1.5||0.08|
|Number of TURBT||2.0 ± 1.7||2 ± 1.7||2.1 ± 2||0.2|
|TC time (months)||61 ± 65||63.3 ± 54.3||56.6 ± 50.3||0.8|
|NAC|| || || || |
| Yes||50 (4.2)‡||33 (4.1)||17 (4.3)||0.9|
| No||1136 (95.8)||757 (95.9)||379 (95.7)|
|Clinical stage|| || || || |
| T0||17 (1.4)||12 (1.6)||5 (1.2)||0.08|
| TIS/Ta||163 (13.7)||101 (12.8)||62 (15.6)|
| T1||357 (30.1)||241 (30.5)||116 (29.3)|
| T2||649 (54.7)||436 (55.1)||214 (53.9)|
|Hydronephrosis|| || || || |
| Yes||216 (18.2)||144 (18.2)||72 (18.1)||0.9|
| No||970 (81.8)||646 (81.8)||324 (81.9)|
|Palpable mass|| || || || |
| Yes||45 (3.8)||26 (3.3)||19 (4.8)||0.2|
| No||1141 (96.2)||764 (96.7)||377 (95.2)|
|CIS|| || || || |
| Yes||358 (30.2)||236 (29.8)||122 (30.8)||0.7|
| No||828 (69.8)||554 (70.2)||274 (69.2)|
|Multiple tumours|| || || ||0.7|
| Yes||269 (22.6)||177 (22.4)||92 (23.2)|
| No||917 (77.3)||613 (77.6)||304 (76.8)|
|LVI|| || || || |
| Yes||191 (16.1)||139 (17.6)||52 (13.1)||0.09|
| No||995 (83.9)||651 (82.4)||344 (86.9)|
|Tumour grade|| || || || |
| High||1038 (87.5)||705 (89.3)||333 (84)||0.1|
| Intermediate/low||148 (12.4)||85 (10.7)||63 (16)|
|Pathological stage|| || || || |
| T0||95 (8.0)||55 (7.0)||38 (9.5)||0.6|
| TIS/Ta||199 (16.8)||127 (16.1)||68 (17.1)|
| T1||201 (17.0)||117 (14.8)||67 (16.9)|
| T2||266 (22.4)||141 (17.8)||72 (18.2)|
| T3||341 (28.7)||132 (16.7)||57 (14.5)|
| T4||84 (7.1)||33 (4.2)||15 (3.9)|
| N+||264 (22.2)||185 (23.4)||79 (19.9)|
Table 2. Preoperative clinical stage breakdown stratified according to postoperative pathological stage
Table 3. Predictive variables selected through stepwise logistic regression analysis and related weight calculated by PCA method
|Clinical stage||0.371 (0.312–0.441)||<0.001||0.586802|
|Multiple tumours||1.363 (1.065–1.743)||0.01||−0.257238|
|Palpable mass||0.457 (0.291–0.717)||0.007||0.236816|
Accordingly, age and clinical stage had the most and multiple tumours and palpable mass had least contribution to the final score. A cut-off value of 0.14 was calculated to discriminate OC BC from EV disease. Sensitivity, specificity, PPV and NPV as well as the C-index value of the PCA formula in the training and test data sets are presented in Table 4. Accordingly, the formula categorized 65.5% and 62.8% of patients correctly as OC BC and the probability of having pathological OC disease when the PCA score indicated OC disease was 61.9% and 51.5% in the training and test data sets, respectively. Regarding pathological EV disease, the formula correctly categorized 70.6% and 68.9% of patients with pathological EV disease as EV BC and the probability of having pathological EV BC when the PCA score indicated EV disease was 73.7% and 75.8% in the training and test data sets, respectively.
Table 4. Sensitivity, specificity, predictive values and area under the curve score of the PCA model to predict pathological OC BC in the training and test data sets
|Sensitivity||65.5 (60.8–69.9)||62.8 (55.6–68.1)|
|Specificity||70.6 (65.4–75.2)||68.9 (60.8–76.0)|
|PPV||73.7 (68.9–77.9)||75.8 (69.0–81.6)|
|NPV||61.9 (56.9–66.7)||51.5 (44.4–58.5)|
Clinical stage is an integral part of outlining treatment strategy and counselling patients with BC. The discordance rate between clinical stage and surgical stage, however, is currently undesirably high. The agreement rates between clinical stage and post-surgical pathological stage vary widely from 20% to 80% and, surprisingly, there has been a gradual increase in the rate of upstaging over time . Shariat et al.  reported only 35.7% agreement between TUR stage and surgical stage in patients with BC. They reported pathological upstaging in 42% and pathological downstaging in 22% of their patients. The 5-year recurrence-free survival for OC disease is 89% and it drops sharply to 62%, 50% and 35% for pT3b, pT4 and node positive tumour, respectively . In our experience, pathological upstaging to EV disease at cystectomy in patients with clinical OC BC was strongly associated with poor prognosis . The upstaging rate from OC to EV disease is 36%–43% [6,14]. A recent European multi-institutional study showed that upstaging from OC to EV disease accounted for the major part (80%) of overall pathological upstaging in their study . Data from US studies also showed 32% upstaging from TUR stage T2 or less to pT3–4 and/or pN1–3 stages at cystectomy . In line with previous studies, the rate of upstaging from OC to EV and/or lymph node positive (LNP) BC was approximately 42% in our study which underscores the need for additional staging aids such as statistical predictive models. According to the results of the clinical trials performed on the role of NAC in muscle-invasive BC, all patients with local and advanced muscle-invasive BC (T2–T4) benefit from NAC. However, there is another common strategy to reserve NAC for more advanced stages (≥ T3) and treat patients with OC muscle-invasive BC, especially those with low risk disease without LVI or hydronephrosis, with RC and lymph node dissection since they show an excellent response to surgery alone . An important feature of our PCA model was the significant decrement in clinical under-staging rate (less than 25% pathological EV disease in the test set) compared with the reported rates by routine clinical staging alone (42.1% in our cohort and 36%–43% in the literature) which may also help in selection of candidates for NAC in the subset of patients with clinical OC BC but pathological advanced BC.
Several clinicopathological factors are reportedly associated with post-surgical pathological stage. Turker et al.  reported that clinical T2 stage, high grade tumour, LVI, female gender and histological variants were associated with a risk of upstaging from OC to EV disease. In addition, patients with preoperative hydronephrosis have 2.01 and 1.94 times higher risk of EV and LNP disease, respectively . Using a multivariate model, we also showed that hydronephrosis was the most important predictor of pathological upstaging in invasive BC . Karakiewicz et al.  performed the first modelling study on 726 patients and developed two nomograms to predict pathological EV and LNP disease. The T stage nomogram which consisted of age, TUR grade and CIS at TUR provided 4.0%–4.3% gain to the predictive accuracy of TUR stage alone (75.7% vs 71.4%) and the N stage nomogram which involved TUR stage and TUR grade provided 2.1%–2.3% gain to the predictive accuracy of TUR stage alone (63.1% vs 61.0%) . In the present report, we give the results of the largest study on modelling approach to predict post-cystectomy pathological stage using clinical and biopsy-driven information. In comparison with previous studies, a large homogeneous group of patients as well as extensive pre-cystectomy variables were used to develop the model. We demonstrated that the PCA model including age, IVnum, LVI, hydronephrosis, multiplicity of tumours and palpable mass improved the staging accuracy of routine clinical stage.
The use of prognostic and decision aids in urology, e.g. nomograms, look-up tables, risk grouping and artificial neural networks, has grown rapidly in the last decade . We used a different multivariate statistical technique, termed PCA, to create our model in this study. PCA involves a mathematical variable reduction procedure that transforms several correlated variables into a smaller number of variables or into a single composite parameter (PCA score). The PCA algorithm is a well established analytical method in gene clustering , image analysis  and prognostic stratification of patients with prostate cancer . We considered a relatively large number of pre-cystectomy variables (more than 14 variables) to develop our model and some of those factors are at least partially correlated in a clinical setting (i.e. IVnum with time from diagnosis to cystectomy; clinical stage with NAC and with palpable mass; CIS and LVI with history of intravesical chemotherapy); thus the PCA model is seemingly a suitable statistical technique to develop a predictive model in this setting. The PCA formula provides a suitable, coherent tool to predict post-surgical pathological stage without the need for statistical software for interpretation/prediction.
Despite the substantial decrease in pathological upstaging rate with PCA model, the sensitivity and NPV were only moderate. Predictive accuracy as indicated by the C-index value in this study was also moderate and at least 32% of patients were not accurately staged by the PCA formula. Similarly, Karakiewicz et al.  stated that 24.3% and 36.7% of patients would still be misclassified with pT3–4 and pN1–3 nomograms, respectively. External validation of those nomograms in a multi-institutional study on 2477 patients also showed low staging accuracy of 67.5% for pathological T staging and 54.5% for pathological N staging, and both nomograms underestimated the real incidence of locally advanced BC . Recent studies have shown that biological markers would significantly improve the predictive ability of staging models. Margel et al. [23,24] developed a PCA pre-cystectomy staging model using clinical information together with blood biomarkers including cancer antigen 125, carcinoembryonic antigen and carbohydrate antigen 19-9 and reported 85% accuracy in predicting EV BC. Lotan et al.  reported that the accuracy of the nomogram designed to predict recurrence in patients with non-muscle-invasive BC increased from 75% to 81% when nuclear protein matrix 22 was added to the model. Accordingly, we believe the current simple user-friendly PCA formula adds clinically important value to the predictive accuracy of clinical staging and is a useful decision aid to predict post-surgical pathological stage, especially pathological EV disease. However, like the two proposed nomograms, it has suboptimal accuracy and tissue-based or blood-based markers are apparently needed to optimize the predictive accuracy of the analytical model regardless of the quality and/or quantity of the involved clinical variables or the statistical method used to develop the model.
The results of this study should be interpreted in the context of some limitations. As a tertiary referral centre, most of our patients had been referred from different community-based hospitals and urologists in private practice and most patients had their primary TUR performed at an outside hospital. The different quality of TURs reportedly leads to variation in clinical staging from 5% to 70% and may adversely affect the adequacy of biopsy specimens and the reproducibility of the current PCA staging model in other institutes. In addition, a different interpretation of histological findings on TUR specimens among pathologists may have an impact on the accuracy of TUR-related variables such as LVI. We tried to minimize this effect by doing re-TUR for more recent patients and by re-reading of the TUR pathology slides by a dedicated genitourinary pathologist when the procedure was performed at an outside hospital. Palpable bladder mass is a relatively subjective finding which depends on several factors such as a patient's body habit. However, palpable mass had the lowest contribution to the PCA formula compared with other integrated variables and did not significantly affect the final PCA score.
In conclusion, we generated and internally validated a pre-cystectomy PCA staging model involving clinicopathological information of a large cohort of patients and we were able to improve the ability of clinical stage to predict post-surgical pathological stage of BC. More importantly, this model enables urologists to avoid clinical under-staging. It is simple to use and does not need sophisticated statistical software for interpretation. The formula needs to be externally validated before routine application in clinical settings and incorporation of tissue-based and blood-based biomarkers seems essential in order to enhance the accuracy of predictive staging models.
We thank Dr Eila Skinner for her intellectual contribution to the manuscript and Gus Miranda, project manager at USC Urology, for his collaboration on the study.
CONFLICT OF INTEREST