Recalibrating survival prediction among patients receiving trans‐arterial chemoembolization for hepatocellular carcinoma

The Pre‐TACE‐Predict model was devised to assess prognosis of patients treated with trans‐arterial chemoembolization (TACE) for hepatocellular carcinoma (HCC). However, before entering clinical practice, a model should demonstrate that it performs a useful role.


| INTRODUC TI ON
The prognostic classification of patients with hepatocellular carcinoma (HCC) is problematic since overall survival is influenced by tumour burden, residual liver function, and patient performance status.
Over the last few years, several systems were developed to predict the prognosis of patients undergoing trans-arterial chemoembolization (TACE). [1][2][3][4][5][6][7][8][9] Among all the proposed, the Pre-TACE-Predict model is the largest and most comprehensive model, applicable prior to treatment fulfilling decision-making needs. 9 The Pre-TACE-Predict model includes tumour features [number and largest diameter, presence/absence of macrovascular invasion, and alpha-fetoprotein (AFP) values], liver tests (albumin and bilirubin) and, unique among the available prognostic models, aetiology of liver disease (hepatitis virus B, C, and alcohol). This peculiarity is important considering that the Pre-TACE-Predict model was generated including both Eastern and Western patients, thus potentially accounting for the variability related to the worldwide epidemiologic features of HCC. However, one model may not necessarily fit every different population. Indeed, in its external validation process, the Pre-TACE-Predict model showed better accuracy in Eastern than in Western patients. In this regard, given the potential use of this clinical decision aid, its generalized use in the clinical practice should follow the demonstration of a reliable prognostic capability in independent series, especially in Western populations. 10 -12 In the present study, we performed an independent external validation applying the most severe level of stringency, that differ 24 Gastroenterology Unit, Department of Surgical and Medical Sciences, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy 25 Medical Semeiotics Unit, Department of Medical and Surgical Sciences, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy in terms of investigators, location and recruitment calendar period from those which produced the original model. 11,12 2 | ME THODS

| Study population
Data derived from the Italian Liver Cancer (ITA.LI.CA) registry, which prospectively collected data of HCC patients diagnosed and treated in 24

| ME THODOLOGY
In prognostic models, patients and clinicians are interested in the future risk of the disease rather than the probability of a positive test. 15,16 Discrimination serves to separate patients with and without the outcome, and the discriminatory ability is important in diagnostic settings, where a separation between people with or without the disease according to the test result or model score is needed.
Calibration concerns the agreement between the observed and the predicted risk, and a good calibration is very important in prognostic settings where a prediction of the future risk of a specific patient group is desirable. Models with adequate calibration by the predicted risk strata provide useful information for medical decision making. 17 On this background, our primary endpoint was to assess calibration of the Pre-TACE-Predict model in a large, independent, external validation cohort. In order to do so, the predicted 6-, 12-, 24-and 36-month survival rates were computed for all patients and median values of risk categories were compared with survival rates from Kaplan-Meier curves. Relationships between median predicted and observed values were graphically inspected and measured through R 2 calculation. Eventual need for recalibration was accomplished through the 'calibration-in-the-large', avoiding modifying the linear predictor achieved by refitting the baseline survival rates for the present data as described by Royston and Demler. 11,18 The second aim of the study was to verify the discrimination ability of the Pre-TACE-Predict model. To this end, the non-stratified linear predictor was measured with the common Harrell's C index and Gönen & Heller's K index. In particular, this latter indicator is considered a more robust concordance index than the Harrell's C

KEY POINTS
Trans-arterial chemoembolization (TACE) is a palliative treatment commonly used in clinical practice for the treatment of primary liver cancer. The prognosis of patients undergoing TACE is quite heterogeneous due to a number of factors, mainly related to both patients and tumour characteristics. The Pre-TACE is a prognostic model that has shown to be able to accurately predict prognosis in a large series of patients undergoing TACE. However, before entering clinical practice, a model should demonstrate that it performs a useful role. In this study, we performed an independent external validation of the Pre-TACE model in a cohort that differs in setting and time period from the one that generated the original model. We observed that a recalibration of Pre-TACE-Predict model improved the estimation of survival probabilities of HCC patients treated with TACE, and that, in comparison to other available models, it is the best prognostic tool currently available for these patients.
index, since it relaxes the proportion of censored cases. 19 In the presence of an elevated number of censored events, the Harrell's C index increases deceptively. 20 Regression coefficients for one or more covariates in the Pre-TACE-Predict model may differ between original derivation and the present external validation datasets. This was formally tested by running a first Cox regression on the covariates forming the Pre-TACE-Predict model in the current dataset (using clustering to account for the multi-institutional origin of data) and comparing the obtained coefficients to those previously published. Subsequently, a second identical Cox regression was adopted 'offsetting' the original PI evaluated in the validation dataset, so that the coefficient of PI was constrained to equal 1. In a hypothetical perfect model, these latter variable coefficients would be 0. 11,18 All the analyses were conducted in Stata (16.1, StataCorp LLC).

| RE SULTS
Baseline characteristics of the 826 patients included in the study are reported in Table 1. The main differences in our series as compared

| Calibration
Calibration of the Pre-TACE-Predict model is depicted in Figure 2A.
The R 2 calculated between the observed survival rates and those predicted by Pre-TACE-Predict model was 0.667. Notably, observed survival rates were always higher than the predicted ones, suggesting a systematic underestimation by the prediction model. Predicted versus observed survival rates by risk groups are reported in Figure S1.
Recalibration through 'calibration-in-the-large' was applied refitting the baseline survival for the present data and the following

| Discrimination
In the present study population, the Harrell's C index of the linear  Table 2, together with coefficients obtained when the linear predictor was also included and constrained to 1.

| Comparison with other scores
To evaluate if other available models discriminate better than the  (Table 3).  (Table S1).  The present study has some limitations to be acknowledged.

| D ISCUSS I ON
First, data on DAA administration and achievement of sustained virological response in patients with chronic hepatitis C were not available. Therefore, its favourable impact on the survival of our patients remains speculative. Second, we considered only the patient status before the first TACE, as it was the indispensable pre-requisite to test the Pre-TACE-Predict model. The eventual availability of data regarding radiological response would have allowed the evaluation of the Post-TACE-Predict model. However, such radiological data were not available for most of the patients, referring this task to another possible study. Last, we excluded patients who underwent liver transplantation after TACE which was used as a 'bridge' to surgery. This selection, necessary to fulfil the Pre-TACE-Predict inclusion criteria, can have modified the characteristics of patients who potentially benefit from TACE. However, it can be confidently assumed that the proportion of patients within the registry who were transplanted after TACE was minimal. 13 In conclusion, our study provided an external validation of the Pre-TACE-Predict model, which is the most recently proposed and comprehensive prognostic model for HCC patients treated with TACE. However, due to the improved outcome of TACE treatment observed in recent years, a recalibration of the baseline survival function was needed to optimize the estimation of survival probabilities. As a result, the highest discriminatory ability of the Pre-TACE-Predict model in comparison to the other models, together with its risk stratification and recalibration, makes it the best prognostic instrument we currently have to predict the TACE outcome in HCC patients.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found online in the Supporting Information section.