Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer



Many models for clinical prediction (prognosis or diagnosis) are published in the medical literature every year but few such models find their way into clinical practice. The reason may be that since in most cases models have not been validated in independent data, they lack generality and/or credibility. In this paper we consider the situation in which several compatible, independent data sets relating to a given disease with a time-to-event endpoint are available for analysis. The aim is to construct and evaluate a single prognostic model. Building a multivariable model from the available prognostic factors is accomplished within the Cox proportional hazards framework, stratifying by study. Non-linear relationships with continuous predictors are modelled by using fractional polynomials. To assess the discrimination or separation of a survival model, we use the D statistic of Royston and Sauerbrei. D may be interpreted as the separation (log hazard ratio) between the survival distributions for two independent prognostic groups. To evaluate the generality of a prognostic model across the data sets, we propose ‘internal–external cross-validation’ on D: each study is omitted in turn, the model parameters are estimated from the remaining studies and D is evaluated in the omitted study. Because the linear predictor of a survival model tells only part of the story, we also suggest a method for investigating heterogeneity in the baseline distribution function across studies which involves fitting completely specified, flexible parametric survival models (Royston and Parmar). Our final models combine the prognostic index (obtained with stratification by study) with the pooled baseline survival distribution (estimated parametrically). By applying this methodology, we construct two prognostic scores in superficial bladder cancer. The simpler of the two scores is more suited to clinical application. We show that a three-group prognostic classification scheme based on either score produces well-separated survival curves for each of the data sets, despite identifiable heterogeneity among the baseline distribution functions and to a lesser extent among the prognostic indexes for the individual studies. Copyright © 2004 John Wiley & Sons, Ltd.