Systematic review and validation of prognostic models in liver transplantation

Authors

  • Matthew Jacob,

    1. Health Services Research Unit, London School of Hygiene and Tropical Medicine, London, UK
    2. Clinical Effectiveness Unit, The Royal College of Surgeons of England, London, UK
    Search for more papers by this author
  • James D. Lewsey,

    1. Health Services Research Unit, London School of Hygiene and Tropical Medicine, London, UK
    2. Clinical Effectiveness Unit, The Royal College of Surgeons of England, London, UK
    Search for more papers by this author
  • Carlos Sharpin,

    1. Clinical Effectiveness Unit, The Royal College of Surgeons of England, London, UK
    Search for more papers by this author
  • Alexander Gimson,

    1. Liver Transplant Unit, Addenbrooke's Hospital, Cambridge, UK
    Search for more papers by this author
  • Mohammed Rela,

    1. Institute of Liver Studies, King's College Hospital, London, UK
    Search for more papers by this author
  • Jan H.P. van der Meulen

    Corresponding author
    1. Health Services Research Unit, London School of Hygiene and Tropical Medicine, London, UK
    2. Clinical Effectiveness Unit, The Royal College of Surgeons of England, London, UK
    • Clinical Effectiveness Unit, The Royal College of Surgeons of England, 35-43 Lincoln's Inn Fields, London, WC2A 3PE
    Search for more papers by this author
    • Telephone: 44 20 7869 6601; FAX: 44 20 7869 6644


Abstract

A model that can accurately predict post–liver transplant mortality would be useful for clinical decision making, would help to provide patients with prognostic information, and would facilitate fair comparisons of surgical performance between transplant units. A systematic review of the literature was carried out to assess the quality of the studies that developed and validated prognostic models for mortality after liver transplantation and to validate existing models in a large data set of patients transplanted in the United Kingdom (UK) and Ireland between March 1994 and September 2003. Five prognostic model papers were identified. The quality of the development and validation of all prognostic models was suboptimal according to an explicit assessment tool of the internal, external, and statistical validity, model evaluation, and practicality. The discriminatory ability of the identified models in the UK and Ireland data set was poor (area under the receiver operating characteristic curve always smaller than 0.7 for adult populations). Due to the poor quality of the reporting, the methodology used for the development of the model could not always be determined. In conclusion, these findings demonstrate that currently available prognostic models of mortality after liver transplantation can have only a limited role in clinical practice, audit, and research. (Liver Transpl 2005;11:814–825.)

Developing a prognostic model that can accurately predict mortality of patients after liver transplantation has been the focus of much research. Such models are important for many reasons. First, it can be used to identify patients with end-stage liver disease who may benefit from liver transplantation. Second, prediction of mortality after liver transplantation can provide patients with information about their prognosis that will support informed decision making about their treatment. Third, prognostic information is an essential component of the construction of risk-adjusted comparisons of outcomes between transplant units.

It has been shown that the model for end-stage liver disease (MELD) score is an accurate predictor of survival of patients without transplantation.1 However, the MELD score is a poor predictor of survival after transplantation.2 It has been claimed that prognostic models that specifically aim to predict post–liver transplant mortality on the basis of clinical information that is available before transplantation will always fail to perform adequately. The reasoning is that outcomes after liver transplantation may depend on unpredictable events that occur during the perioperative period rather than on the severity of the liver disease and comorbid conditions. A first step to verifying this claim is to evaluate existing prognostic models for mortality after liver transplantation. In light of this, a systematic review of the literature was carried out (1) to assess the quality of the studies that developed and described prognostic models, (2) to validate existing models on a United Kingdom (UK) and Ireland data set, and (3) to establish whether a satisfactory predictive model already exists.

Abbreviations:

MELD, model for end-stage liver disease; UK, United Kingdom; ROC, receiver operating characteristic.

Methods

Inclusion Criteria

Recent studies that aimed to develop a non–disease specific prognostic model for survival after liver transplantation were included. Many studies have demonstrated an association between prognostic factors and post–liver transplant survival, but their aim has not been to produce a definitive prognostic model. For example, in a study by Nair et al., a multivariate Cox regression model is presented, but its aim was to show that race is associated with post–liver transplant outcome even after adjusting for other variables.3 This study was not considered in our review, because the steps taken in the modeling stage are likely to be different from studies that have the development of a prognostic model as their principal aim. The selection of variables to adjust for was made by considering which variables may have confounded the relationship between an “exposure” and the “outcome.” The variables selected in this way are not necessarily the same as those that would have been obtained if the aim had been to develop a prognostic model. Studies that considered prognostic factors that are not available before transplantation were also excluded from the review, because our focus was on prognostic predictions that can be made before transplantation. For this reason, a study by Doyle et al. was excluded because, although pretransplantation variables were considered, the final model contains posttransplantation variables only.4

Studies that produced prognostic models that could be validated on the adult UK and Ireland cohort were included. For this reason, studies with only pediatric patients were excluded. Studies were included irrespective of the length of survival it investigated. Only models developed in the last 10 years (since 1994) were included as it is unlikely that older models would be applicable to current liver transplantation practice because of advances in technology and changes in transplantation populations. Studies that developed prognostic models using artificial neural network methods could not be included because it was impossible to mimic the procedure just from the reported results in the literature.

Search Strategy

Relevant studies were identified by searching Medline and Embase from 1994 to week 4 of 2004, using a search strategy that is outlined in the Appendix. Two authors (M.J. and J.D.L.) reviewed the titles and, if clarification was needed, abstracts to identify potentially relevant articles. Studies that appeared to meet the inclusion criteria were read in full. Given the limitations of electronic searching, reference lists of relevant studies were also searched. Moreover, personal files of studies relating to liver transplantation were checked and experts in the field were consulted.

Quality Assessment

For each study, the internal, external, and statistical validity, the rigour of model evaluation, and practicality of the model were assessed. The assessment tool used (Table 1) is a modified version of published instruments.5, 6

Table 1. Tool for Quality Assessment of Prognostic Models
Internal validity
1) Was an inception cohort established (e.g. all patients entered into the study at the time of their first liver transplantation)?
2) Were more than 80% of patients in the inception cohort followed up in order to minimise bias?
3) Were baseline data collected prospectively?
4) Were the candidate prognostic factors clearly defined?
External validity
1) Was the model generated on a multi-centre population?
2) Was there an adequate description (e.g. age, sex, diagnosis, year of transplantation, urgency of transplantation, first/subsequent transplantation) of the cohort of patients on which the models were developed?
Statistical validity
1) Were candidate prognostic factors which were continuous variables dealt with appropriately prior to model building (e.g. categorised using clinical consensus, linear relationship with outcome variable checked)?
2) Was the sample size adequate as defined by an event per degree of freedom ratio of 10 or more?
3) Was collinearity between the candidate prognostic factors assessed?
4) Were missing values for candidate prognostic factors dealt with appropriately?
Evaluation of the model
1) Were the assumptions of the final model tested appropriately?
2) Was the sensitivity of the final model to influential observations considered?
3) Was the model validated using the same source of data but different to that used to develop the final model?
4) Was the model validated using a source of data external to that used to develop the final model?
Practicality of the model
1) Are the final model prognostic factors routinely available in clinical practice?
2) Was the final model described in a complete enough manner so that it could be fitted to other data sets?
3) Was the precision of the final model predictions considered?
4) Does the model have the potential to have wide generalisability?

Internal validity refers to the prognostic value of the model in patients from the same source population.7 It was ascertained whether an inception cohort had been established,8 loss to follow-up before the end of the study period was minimal, data had been collected prospectively, and candidate prognostic factors were clearly defined. The disadvantage of not having an inception cohort is that patients may be included at different time points in their disease process. This is the case if a study included a mixture of patients receiving their first, second, or later transplantation. A large proportion of patients lost to follow-up results in a decrease of power and may also produce biased prognostic factors.

External validity refers to the prognostic value of the model in patients from outside the source population.7 It was assessed whether a model was developed on a multicenter population and whether there was an adequate description of the population. A multicenter model is likely to have better generalizability than a single-center model. Without adequate description of the population, judgements about its applicability to other populations cannot be made.

Treatment of continuous candidate prognostic factors, sample size, colinearity and the handling of missing values comprised the assessment criteria for statistical validity. If continuous prognostic factors were categorized, the cutoff values needed to be based on clinical arguments. If prognostic factors were entered as continuous variables in the model, the assumed nature of the relationship with the outcome variable (linear in most cases) needed to have been checked. Sample size was deemed adequate if a study had at least 10 outcome events for each degree of freedom in the prognostic model.9, 10 Colinearity needed to be addressed because it can affect which factors are selected into the prognostic model. For missing values, unless the percentage of “missingness” is small, some form of imputation was deemed necessary (see Results for further details).

The model evaluation section of the quality assessment tool checked whether the assumptions of the final model were tested, determined whether the sensitivity of the final model to influential observations was considered, and inquired how model validation was carried out. If model assumptions and sensitivity were not checked, it is possible that the coefficients associated with the prognostic factors were biased and that important prognostic factors were overlooked in the model development process. A key assumption in Cox regression is proportional hazards. A check of this assumption is important especially for studies with long follow-up.

Finally, the practicality of the model was assessed by determining whether the prognostic factors in the proposed models are available in clinical practice, the model had been completely described, the precision of model predictions had been considered, and the model could be generalized according to place and time. Confidence intervals around prediction estimates are useful to provide users of the model an indication of the precision of the predictions.

Assessment of study quality using the criteria described above and summarized in Table 1 was made by 2 authors (M.J. and J.D.L.). Disagreements were resolved by involving a third author (J.H.P.vdM.).

Validation of Models

The data from the UK & Ireland Liver Transplant Audit set was used to validate the identified prognostic models. This data set included patients who had received a first liver transplantation between March 1994 and September 2003 in both countries. The validation of the identified models was carried out using 2 different approaches. First, for each patient a prognostic index score was calculated on the basis of the model coefficients presented in the original paper (original prognostic index). Second, another prognostic index score was calculated on the basis of model coefficients that were reestimated by developing a prognostic model in the UK and Ireland data that only considered the prognostic factors included in the original index (derived prognostic index). If the coefficients that make up the original and the derived prognostic index were similar, these 2 approaches were expected to provide comparable measures of discriminatory ability.

The discriminatory ability of the model was assessed using the area under the receiver operating characteristic (ROC) curve.11 The ROC area can be interpreted as the chance that the predicted probability of death is larger for a randomly chosen patient who died than for a randomly chosen patient who survived. A model that discriminates perfectly has an ROC area of 1, whereas a model which discriminates no better than chance has an ROC area of 0.5. The calibration of the models was assessed using plots of observed against predicted mortality.

One of the identified prognostic models was developed in a data set of patients transplanted in a number of European countries of whom 19% were transplanted in the UK & Ireland data set.12 Given that more than 80% of patients came from other countries, the validation of this model in the UK and Ireland data set was considered justified.

The UK and Ireland data set contained missing information for some of the candidate prognostic factors. Replacement values were obtained by imputing means for continuous variables and assigning missing categorical variable information to the category with the highest frequency.

Results

Systematic Review

The search strategy identified 2,422 papers. Review of the titles and abstracts of these papers suggested that 45 papers could be relevant. After reading these papers in full, only five met the inclusion criteria and these are summarized in Table 2.11–15 In 3 of these papers, the same source of data (United Network for Organ Sharing database) was used.13, 15, 16 Even though the same prognostic factors frequently appeared in different models, they often differed in format or definition. For example, cold ischemia time is treated as a binary variable (12-hour cutoff used) in Adam et al.12 but as a continuous variable in Ghobrial et al.13 Recipient age and recipient gender were the only candidate prognostic factors to be considered as candidate prognostic factors in all 5 papers. The prognostic factors that were most frequently included in the final models were recipient age (4 papers12, 13, 15, 16), retransplantation (3 papers12, 13, 16), and bilirubin (3 papers13, 14, 16).

Table 2. Summary of the Papers Describing Prognostic Models Identified in Systematic Review
 Adam et al.12 (2000)Ghobrial et al.13 (2002)Bilbao et al.14 (2003)Thuluvath et al.15 (2003)Desai et al.16 (2004)
  • Note: ● = prognostic factor in final model, ○ = prognostic factor considered but not in final model, × = prognostic factor not considered (except in Adam et al.12 where this could not be determined because only candidate prognostic factors found to be significant after univariate analyses were presented in this study).

  • #1

    Child-Pugh score

  • 2Prothrom time/international normalized ratio

  • #3

    Cross clamp; veno-venous bypass time

number of patients13,34525,77219023,6682,565
single-/multi-centremulti-centremulti-centresingle-centremulti-centremulti-centre
duration of study1988-19971990-20001991-19971987-20011998-2001
statistical analysiscox regressioncox regressionlogistic regressionlogistic regressioncox regression
Recipient factorsAdultChildAdults onlyAdults onlyAdults onlyAdults only
 age
 sex
 race×××××
 BMI/Malnutrition×××
 Retransplantation××
 Diagnosis××
 UNOS status 1/Acute hepatic failure×
 Ventilation×××××
 Renal support××××
 Bilirubin/CP#1××
 PT/INR/CP#2××
 Albumin/CP×××
 Creatinine××
Operative factors      
 Warm ischemia time×××××
 CC + VV-BP time#3×××××
Donor factors      
 Age××
 Sex××
 Preservation fluid××××
 Blood group compatibility××××
 Cold ischemia time××
Centre factors      
 Number of procedures per year××××
 Number of split transplants per year××××

The assessment of the quality of the prognostic models on the basis of the tool described in the Methods section is summarised in Table 3. In 2 of the papers, it was possible to determine that an inception cohort had been established, as all included patients had received a first transplantation.14, 15 In the other papers, patients who had received a retransplantation were also included (11%, 12%, and 18% in the papers by Adam et al.,12 Ghobrial et al.,13 and Desai et al.,16 respectively). In these 3 papers, retransplantation was also used as a candidate prognostic factor, but it was unclear whether this factor related to the status of patients before transplantation or whether it was used as an description of events that occurred after transplantation.

Table 3. Quality Assessment of the Prognostic Models According to Tool Presented in Table 1
 Adam et al.12 (2000)Ghobrial et al.13 (2002)Bilbao et al.14 (2003)Thuluvath et al.15 (2003)Desai et al.16 (2004)
  • #

    Not reported.

Internal validity     
1) Inception cohort?NR#NRYesYesNR
2) Follow-up >80%?NRNRNRNRNR
3) Prospective data?YesYesNoYesYes
4) Clear definition of prognostic factors?YesYesYesYesYes
External validity     
1) Multi-centre population?YesYesNoYesYes
2) Adequate description of cohort?YesYesYesYesYes
Statistical validity     
1) Appropriate handling continuous variables?YesNoYesYesNR
2) Adequate sample size?YesYesNoYesYes
3) Assessment of collinearity?NRNRNRNRNR
4) Appropriate handling missing values?NoNoNRNoNo
Evaluation of the model     
1) Model assumptions tested?NRNRNRNRNR
2) Sensitivity to influential observations considered?NRNRNRNRNR
3) Model validated in different dataset but from same source?NoNoYesNoYes
4) Model validated in different dataset but from external source?NoNoNoNoNo
Practicality of the model     
1) Prognostic factors routinely available?YesYesYesYesYes
2) Complete description of risk model?YesYesYesYesYes
3) Consideration of precision of model predictions?YesNoNoNoNo
4) Potential to have generalisability?YesYesYesYesYes

None of the papers reported the loss to follow-up before the end of the study period. However, the Bilbao et al.14 study seemed to have achieved 100% follow-up at their study endpoint of 90 days. It is also possible that the Adam et al.12 study achieved excellent follow-up, because a mechanism was in place to update follow-up information at regular intervals. However, this study excluded 464 patients for whom no information on outcome was available.

The external validity of all models could be assessed, as all provide adequate descriptions of the patient characteristics. Thus, it is clear for which patient groups the prognostic model will be appropriate. Only 1 of the models was not developed on a multicenter population.14

One model14 was developed using a data set that was too small. Only 29 events occurred in this study. On the basis of the rule that a study should have at least 10 events for each degree of freedom in a prognostic model, a model with, at the most, 3 degrees of freedom should have been considered, whereas there were 4 degrees of freedom in the final model. A larger data set from the same population would have had greater power and may have resulted in a final model with different prognostic variables.

Missing values were not dealt with appropriately in 4 of the papers,12, 13, 15, 16 and 1 paper14 did not report on this issue. The 4 papers used complete case analysis so that patients with missing information for any of the candidate prognostic factors were excluded. Following Harrell's guidelines,17 such an analysis is appropriate only if the percentage of missingness is less than 5% (where missingness is defined as missing data points in at least 1 candidate prognostic factor). In Adam et al.12 and Thuluvath et al.,15 the percentage of missingness was 40% and 14%, respectively (not possible to calculate percentage of missingness for Ghobrial et al.13 and Desai et al.16).

The results of checking the model assumptions and the sensitivity to influential observations was not reported in any of the papers. One of the factors in the prognostic model of Adam et al.12 was a binary variable indicating whether the patient had cancer or not. This study had a long follow-up time (>10 years). Cancer patients have relatively good outcomes compared to noncancer patients in the short term, but for long-term outcomes the opposite is true (unpublished survival analysis of UK and Ireland data set). It is therefore unlikely that the assumption of proportional hazards, a key assumption of Cox regression, which was used in the Adam et al.12 paper, is a reasonable one. None of the models were externally validated.

The prognostic factors in all models are routinely available in clinical practice and were described in a complete enough manner so that they could be validated by others or used as prognostic tools in clinical practice.

Validation of Model

The validation results are detailed in Table 4. The discriminatory ability of all models is poor. No model, for an adult population, achieved an ROC area larger than 0.7. For models that have investigated outcomes at different time points,12, 13, 15 the ROC area was greater for time points closer to the time of transplantation than for time points in the more distant future. This indicates, as expected, that it is easier to predict outcomes early after transplantation than later on.

Table 4. Discriminatory Ability (Area Under the ROC Curve With 95% Confidence Intervals) of Prognostic Models as Reported in the Papers and in the UK & Ireland Data Set With Original Prognostic Index and With Derived Prognostic Index (See Text for Further Explanation)
 Reported ValidationUK & Ireland ValidationUK & Ireland Validation
  Original Prognostic IndexDerived Prognostic Index
  • #1

    Validation assessed using same data set that prognostic model developed on.

  • #2

    A prediction of 0.13 or greater from the logistic regression was a prediction of death.

  • #3

    Body mass index < 18 kg/m2 used as a proxy for malnutrition.

  • #4

    Positive predictive value.

  • #5

    Negative predictive value.

Adam et al.12 (2000)   
Adults:   
 90 day outcome 0.63 (0.61-0.66)0.67 (0.65-0.69)
 1 year outcome 0.62 (0.60-0.64)0.64 (0.63-0.66)
 Not reported  
Children:   
 90 day outcome 0.64 (0.57-0.70)0.70 (0.64-0.76)
 1 year outcome 0.67 (0.61-0.73)0.73 (0.68-0.78)
Ghobrial et al.13 (2002)#1  
 90 day outcome0.690.57 (0.55-0.60)0.68 (0.66-0.70)
 6 month outcome0.680.57 (0.55-0.59)0.67 (0.65-0.69)
 1 year outcome0.670.56 (0.54-0.58)0.65 (0.64-0.67)
Bilbao et al.14 (2003)#2#3 
 90 day outcomeSensitivity 0.80Sensitivity 0.59Sensitivity 0.58
 Specificity 0.89Specificity 0.61Specificity 0.62
 PPV#4 0.62PPV 0.17PPV 0.17
 NPV#5 0.95NPV 0.92NPV 0.92
Thuluvath et al.15 (2003)   
 30 day outcome0.700.67 (0.64-0.69)0.67 (0.64-0.70)
 1 year outcome0.700.60 (0.57-0.62)0.63 (0.61-0.65)
Desai et al.16 (2004)   
 90 day outcome0.60 (0.58-0.63)0.61 (0.58-0.64)0.61 (0.58-0.64)

The discriminatory ability of the models was poorer in the UK and Ireland data set than reported in the papers, except for the model developed by Desai et al.16 where the reported discriminatory ability was approximately equal to that observed in the UK and Ireland data set.

The validation of derived prognostic indices in the UK and Ireland data set had ROC areas that were greater than or equal to the ROC areas found for the original prognostic indices. One reason for this is that the ROC areas for the second approach are likely to be overoptimistic, because validation was carried out in the same data set that the model was developed in. The large discrepancy seen for Ghobrial et al.13 is most likely due to the prognostic factors identified in that model having markedly different strengths of association in the development and validation data sets. To elaborate on this point, the distribution of the original prognostic index and the derived prognostic index are shown in Figures 1A and 1B, separately for patients who died and survived in the UK and Ireland data set. An increase in the prognostic indices represents an increase in the predicted probability of death, but it is not possible to compare the scales of the 2 prognostic indices in these figures directly. The distributions of the original prognostic index for the patients who died and those who survived were almost identical. This illustrates why the index discriminated only marginally better than chance. The distributions of the derived prognostic index were less alike with a more pronounced shift to the right for the distribution for those patients who died. Conceivably, the derived prognostic index discriminated better than the original prognostic index. Note that to achieve a perfect discrimination (equivalent to an ROC area of 1) the 2 distributions should not have overlapped at all.

Figure 1.

(A) Distribution of original prognostic index of Ghobrial et al.13 in UK and Ireland data set. (B) Distribution of derived prognostic index of Ghobrial et al.13 in UK and Ireland data set. (See text for further explanation.)

It was possible to test the calibration of only two models,12, 14 because the other papers did not report the relevant information (i.e., the intercept of the logistic regression model or the baseline survival of the Cox regression model). Figure 2 shows the plots of observed mortality against predicted mortality for the 2 models. The calibration for the Adam et al.12 model is clearly superior to the calibration for the Bilbao et al.14 model.

Figure 2.

Plot of observed mortality against predicted mortality calculated on UK & Ireland data set on the basis of (A) Adam's original prognostic index12 for 1-year outcome, and on the basis of (B) Bilbao's original prognostic index14 for 90-day outcome. The 4 points on each graph represent quartiles of the predicted mortalities.

Discussion

There is room for improvement in the development of prognostic models that specifically aim to predict mortality after of liver transplantation. Areas where the most improvement can be made are the use inception cohorts, the completeness of follow-up, the handling of missing values, the testing of model assumptions, and the evaluation of model performance. The discriminatory ability of all identified prognostic models was poor, especially when they were externally validated in the data set of the UK and Ireland Transplant Audit. We found that these models had at best only a marginally better discriminatory ability to predict survival at 90 days after transplantation than the MELD score.2

Limitations

Our results are determined not only by the quality of the models but also by the quality of the reporting. For example, if no mention was made of tests of model assumptions, it could not be determined whether these were carried out or not. It is important to note that the quality of the paper does not necessarily reflect the quality of the research itself. However, only by complete reporting can any model gain the credibility necessary for its application in other populations.

There were differences in the definitions of a small number of prognostic variables between the original studies and the UK and Ireland data set. For example, Bilbao et al.14 used the Pikul method18 to represent malnutrition, whereas the UK and Ireland data set used a body mass index of less than 18 kg/m2. Another example is that the UK and Ireland data set contains information about only the presence or absence of ascites and not about its severity, which is needed to calculate Child-Turcotte-Pugh scores. Therefore, proxies for mild and severe ascites were calculated on the basis of information on diuretic therapy for ascites. It is unlikely that these differences in definitions introduced appreciable bias, as they were rare and reasonable proxy measures were always available.

This review did not assess the quality of the procedures used to select the prognostic factors that were included in the final model. All prognostic models described in this review used statistical selection criteria, most frequently with a stepwise procedure. Although refinements in variable selection approaches have been developed,19 there is as yet no consensus on what is the “best ” approach. For this reason, we did not assess the quality of this component of statistical validity.

Finally, models that were developed using artificial neural network methods were not considered. However, when such methods have been compared to regression approaches, there has been little difference in their performance.20

Dealing With Patients Undergoing Retransplantation

A methodological consideration in developing prognostic models in liver transplantation is how to deal with patients who had more than 1 transplantation. In 3 of the papers,12, 13, 16 retransplantation was used as a prognostic variable, but it was not clear how this was handled in the analysis. One approach is to include in the model retransplantation as a time-dependent variable that changes status when a subsequent transplantation occurs. However, the use of time-dependent variables to predict posttransplant outcomes for individual patients is problematic, as only prognostic information known at the time of transplantation can be readily used. Another approach is to add a second entry for a patient when a subsequent transplantation occurs. The difficulty with this approach is that the same patient appears in the data set more than once, violating the statistical assumption of independence.

Comparison With Other Studies

Systematic reviews of prognostic models have been carried out in the areas of stroke,6, 7 cardiac surgery,21 terminal cancer,22 shoulder disorders,23 and whiplash-associated disorders.24 The findings on the quality of the reviewed models ranged from good7 to poor.6 The assessed quality was mostly average.21–24 Like this review, others often found that inception cohorts were not established,6, 23, 24 that completeness of follow-up was not reported,22–24 and that missing data was inappropriately handled or that the manner in which they were handled was not reported.21

This review identified that 1 of the 5 included models was too small (i.e., did not have at least 10 outcome events for each degree of freedom in prognostic factors of the model). The other reviews found that studies were too small in 62%,6 20%,7 44%,21 14%,22 and 31%23 of all identified studies. Other reviews found it possible to assess the extent of loss to follow-up in prognostic model studies. In one review, 22% of studies identified had a loss to follow-up greater than 10%.6 In 2 other reviews, 50%23 and 33%24 of studies identified had losses to follow-up greater than 20%. Although in this review model assumption testing was not reported at all, the review of prognostic models in terminal cancer found that only 47% of studies had explicitly stated that they had verified assumptions underlying the regression models (e.g., proportional hazards).

Prognostic models have been developed for other areas of surgery. In cardiac surgery, particularly, such models have been the focus of much research, and well-established models exist such as EuroSCORE25 and STS.26 The discriminatory ability of prognostic models in surgery is variable, both within and between surgical specialties. For example, reported ROC areas have been 0.6827 and 0.80 (both gastro-oesophageal cancer surgery),28 0.78 (colorectal cancer surgery),29 0.69-0.79 (infrarenal aortic aneurysm surgery),30 0.81 (pediatric open-heart surgery),31 0.68 (thoracic aorta surgery),32 and 0.71 and 0.84 (coronary bypass surgery).33 Two of these papers had carried out external validation of existing prognostic models,30, 33 and 4 papers included internal validation only.27–29, 32 One paper had neither internal or external validation.31 The discriminatory ability of the prognostic models in liver transplantation would seem to lie at the lower end of the range observed in other surgical specialties.

Implications

This systematic review did not find a prognostic model that discriminates well between patients who died and those who survived in the first period after the liver transplantation. The implication is that, at present, prognostic models of mortality after liver transplantation can have only a limited role in supporting clinical decision making, informing patients of their posttransplantation prognosis or making fair comparisons of transplant units.

The application of prognostic models is further handicapped by the poor quality of the reporting of their development. This implies that there is a need to develop a standard reporting protocol for prognostic models comparable to the CONSORT statement for randomized controlled trials34 and the STARD statement for diagnostic tests.35

Apart from addressing the flaws in model development and reporting that were addressed in this review, there are a number of approaches to further improve the performance of prognostic models for mortality after liver transplantation. First, the prognostic models developed so far have included only conventional prognostic factors that are closely linked to direct measurements and manifestations of organ dysfunction. However, we have recently shown that the functional status of patients immediately before liver transplantation (i.e., the ability to carry out activities of daily living) is a prognostic factor for posttransplant mortality over and above the conventional prognostic factors.36 Model performance may therefore improve if more general measures of a patient's health and functional status are considered.

Second, all 5 prognostic models included in this review used measures obtained at 1 moment in time. It can be argued that the rate of change in prognostic factors may provide a better indication of the worsening of a patient's conditions and, in turn, a more accurate reflection of prognosis. For example, Merion et al. demonstrated that a change in MELD score in a period of 30 days was a stronger prognostic factor of death on the waiting list for transplantation than the current MELD score.37 However, this finding demonstrating the importance of changes in prognostic factors was not supported by studies of waiting-list mortality38 and mortality after transplantation.39

Third, the effect of prognostic factors may depend on the underlying liver disease. For that reason, disease-specific models or models explicitly considering variations in the effect of prognostic factors in patients with different liver disease diagnoses may need to be considered.

Despite all of these opportunities to improve the discriminatory ability of prognostic models for survival after liver transplantation, it is most likely that only relatively small improvements can be achieved, given the apparent limited predictability of outcomes after major surgery procedures on the basis of prognostic information before surgery.

Acknowledgements

The authors thank the following members of the UK & Ireland Liver Transplant Audit and their departments: Derek Manas and Liesl Smith,Freeman Hospital, Newcastle, UK; Steve Pollard and Christine Sutton, St. James' Hospital, Leeds, UK; Neville Jamieson and Claire Jenkins (Addenbrooke's Hospital, Cambridge, UK; Keith Rolles and Dr. Nancy Rolando, Royal Free Hospital, London, UK; Nigel Heaton and Susan Landymore, King's College Hospital, London, UK; David Mayer and Bridget Gunson, Queen Elizabeth Hospital, Birmingham, UK; Oscar Traynor and Mashood Ahmed (St.Vincent's Hospital, Dublin, Republic of Ireland; John Forsythe and Maureen Cunningham, The Royal Infirmary at Edinburgh, Edinburgh, UK; and Kerri Barber, UK Transplant, Bristol, UK.

Appendix

Table  . Appendix. Medline and Embase Combined Database Search
  1. Note: steps 1 to 18 deal mostly with Embase and 19 to 40 with Medline.

Liver Transplantation terms: (Embase)
 1. ((liver adj3 (transplant$ or graft$)) or (hepat$ adj3 (transplant$ or graft$))).tw.
 2. liver transplantation/ or liver graft/
 3. 1 or 2
Study methodology terms: (Embase)
 4. statistical model/
 5. exp statistical analysis/
 6. Multivariate.tw.
 7. (multiple adj5 regression).tw.
 8. model$.tw.
 9. or/4-8
Study design and prognosis terms: (Embase)
10. Clinical study/ or Case control study/ or Family study/ or Longitudinal study/ or Retrospective study/ or Prospective study/ or Cohort analysis/ or (Cohort adj (study or studies)).mp. or ((Case control adj (study or studies)) or (follow up adj (study or studies)) or (observational adj (study or studies)) or (epidemiologic$ adj (study or studies)) or (cross sectional adj (study or studies))).tw.
11. epidemiology/ or life table/ or survival rate/ or morbidity/ or mortality/ or surgical mortality/
12. risk/ or risk assessment/ or risk factor/
13. prognosis/
14. exp treatment outcome/
15. (prognos$ or outcome or predict$ or death or survival or mortality or follow-up or follow up).tw.
16. or/10-15
Excluding animal studies: (Embase)
17. (exp Animal/ or Nonhuman/ or exp Animal Experiment/) not ((exp Animal/ or Nonhuman/ or exp Animal Experiment/) and exp Human/)
18. (3 and 9 and 16) not 17
Liver Transplantation terms: (Medline)
19. ((liver adj3 (transplant$ or graft$)) or (hepat$ adj3 (transplant$ or graft$))).tw.
20. liver transplantation/ or liver graft/
21. 19 or 20
Study methodology terms: (Medline)
22. multivariate analysis/
23. probability/
24. models, statistical/
25. regression analysis/
26. proportional hazards models/
27. survival analysis/
28. multivariate.tw.
29. logistic models/
30. (multiple adj5 regression).tw.
31. model$.tw.
32. or/22-31
Study design and prognosis terms: (Medline)
33. Epidemiologic studies/ or retrospective studies/ or exp cohort studies/ or (cohort adj (study or studies)).tw. or Cohort analy$.tw. or (Follow up adj (study or studies)).tw. or (observational adj (study or studies)).tw. or Longitudinal.tw. or Retrospective.tw. or Cross-sectional studies/
34. risk/ or risk assessment/ or risk factors/
35. epidemiology/ or life table/ or survival rate/ or morbidity/ or mortality/ or surgical mortality/
36. prognosis/
37. exp treatment outcome/
38. (prognos$ or outcome or predict$ or death or survival or mortality or follow-up or follow up).tw.
39. Or/33-38
Excluding animal studies: (Medline)
40. Animal/ not (Animal/ and Human/)
41. (21 and 32 and 39) not 40
Combined Embase and Medline results:
42. 18 or 41
43. 43 remove duplicates from 42

Ancillary