Trials & Tribulations of Liver Transplantation‐ are trials now prohibitive without surrogate endpoints?

During the past 5 decades, liver transplantation has moved from its pioneering days where success was measured in days to a point where it is viewed as a routine part of medical care. Despite this progress, there are still significant unmet needs and outstanding questions that need addressing in clinical trials to improve outcomes for patients. The traditional endpoint for trials in liver transplantation has been 1‐year patient survival, but with rates now approaching 95%, this endpoint now poses a number of significant financial and logistical barriers to conducting trials because of the large numbers of participants required to demonstrate only an incremental improvement. Here, we suggest the following solutions to this challenge: adoption of validated surrogate endpoints; bigger and better collaborative multiarm, multiphase studies; recognition by funders and institutions that work on larger collaborative research projects is potentially more important than smaller, self‐led bodies of work; ringfenced areas of research within trial frameworks where individuals can take a lead; and fair funding structures using both industry and public sector money across national and international borders.

During the past 5 decades, liver transplantation has moved from its pioneering days where success was measured in days to a point where it is viewed as routine, with a median survival of more than 22 years for LT recipients. (1) The step-change in 1-year patient survival seen in the 1980s and 1990s has plateaued, (2) albeit this plateau still reflects progress as it corresponds with the increased use of marginal organs. (3) Furthermore, the increasing focus on transplant benefit means that outcomes after transplant may show slower rates of improvement.
Despite this success, there are numerous unmet needs in the field of liver transplantation, including the development of immunosuppressive regimens with fewer and less severe long-term adverse events (4) ; safe expansion of the donor pool in the face of declining donor organ quality; increased demand from older, less-fit recipients; and the call to expand into other indications. (5) Tackling these issues will require careful study in clinical trial settings.
Traditionally the endpoint of choice for liver transplantation trials was 1-year patient survival; however, with rates in the United States, United Kingdom, and elsewhere approaching 95%, (6,7) at best we can only be looking at incremental improvement. This poses a challenge to conducting trials in terms of the numbers of participants required to demonstrate relatively small improvements in patient survival. To move the field forward, we need to borrow the approaches adopted in oncology and other fields, where validated surrogate markers and novel trial designs are used to ensure that innovations can be rapidly adopted if they show efficacy or discarded in a timely fashion if they do not. (8) Another consideration arises from public and patient involvement in research, where it becomes clear that it is not only the duration of survival but also the quality of that survival, that is, patients do not just want to be alive, they want to be physically, emotionally, psychologically, and functionally well and alive. This means that survival data that are relatively easy to collect and meaningful to a clinician or governmental body are not necessarily the same outcome measures in which patients are interested.

Using Survival as an Endpoint for Powering Trials
Given such excellent outcomes, the scope to make any meaningful improvement in 1-year patient survival is relatively limited, so the ability to demonstrate this robustly within a trial would require huge numbers of patients. In Table 1, we illustrate the sorts of numbers required for a trial powered off improvement in 1-year patient survival, where we assume a survival of only 90% to better demonstrate the numbers required for incremental and step-changes in outcome.
Even these numbers underestimate the total number of patients required to be entered into a clinical study; they are simply the minimum number needed in the trial to demonstrate statistical significance at the anticipated efficacy. Depending on the trial intervention, the duration of the proposed study, and its acceptability to patients, the numbers of patients that would drop out of the trial or are lost to follow-up would be factored in (typically another 10%-30%).
An additional consideration is how many patients would have to be approached to successfully recruit sufficient numbers. This will vary depending on the nature of the trial and the motivation of both the patients and transplant centers to participate. (9) Research infrastructure also plays a significant factor in terms of the ability to recruit and deliver the trial 24 hours a day, 7 days a week. This means that recruitment rates may be only 20% to 30% of total transplant activity. Hence, a trial looking for an incremental improvement in survival requiring 4000 patients for statistical significance assuming a 10% dropout rate requires 4400 recruited patients. Added to this one then needs to consider recruitment/ delivery rates and assuming the ability to recruit, enroll, and deliver a trial intervention in 1 in 3 actual transplants across all centers, this means that the trial as a whole realistically needs access to 13,200 patients. Given that in the United Kingdom there are only about 1000 liver transplantations per year, (6) a trial powered on incremental improvement in 1-year patient survival would require all centers and their recipients to participate and would be predicted to take more than   (10) a trial that will at best bring a small benefit to 1-year patient survival would require a huge financial investment to provide a global research infrastructure and unprecedented collaboration; this is likely to be unappealing to funders, patients, or clinicians. Given these numerical and logistical barriers to performing a trial powered on patient survival, we propose the following suggestions: 1. Lengthen the study period to 5-year or 10-year survival.
• Patient survival inevitably decreases with time and allows a greater scope for significant improvement to be demonstrated within a trial framework. However, even 5-year survival is currently 82.7% in the United Kingdom (6) and almost 80% in the United States. (11) • It may seem attractive to use historical cohorts as a comparator, and this may be legitimate in some studies, but care has to be taken to interpret these results in the context of changing practices/ organ allocation policy/donor and/or recipient characteristics as well as general changes in life expectancy because it may be that any observed differences in survival are independent of your studied intervention. For instance, we know that outcomes after donation after circulatory death (DCD) liver transplantation have been improving during the past decade, (12) so comparing a contemporary cohort receiving a given intervention with a cohort distributed during the preceding 10 years would not be appropriate. We suggest that the use of historical controls should be strongly discouraged and that we should look prospectively with randomized and controlled studies only. • Longer studies have significant cost implications that will impact their attractiveness to funders based on probable cost-return analyses. However, in kidney transplantation it has been demonstrated that this can be mitigated by novel trial design coupled with registry-reported outcomes. (13) • The other consideration is how one can rapidly translate and adopt effective management strategies into clinical practice in the context of a trial that requires multiple years of setup and recruitment followed by 3 to 10 years of follow-up without early readouts such as surrogate markers.
2. Change the emphasis on trial outcomes from patient survival to the following: • Graft or transplant survival. a. Patient survival is defined as the period of time from transplantation to death, graft survival as the time from transplantation to graft failure censored for death with a functioning graft, and transplant survival refers to time from transplantation to patient death or liver retransplantation. (6) b. The advantage of outcomes that include graft failure is that interventions that target graft survival can start to be elucidated more robustly without the confounder of retransplantation. This is likely to be of particular use in high-risk grafts, such as those from DCD donors, where there are higher rates of early graft loss. (12) • Postlisting survival.
a. Patients are focused on their overall survival from the time they join the waiting list rather than their survival after transplantation. Given the significant waitlist mortality and 1-year and 5-year postlisting survival rates in the United Kingdom of 84.1% and 71.9% (6) and in the United States about 90% and 78%, respectively, (14) there is still significant scope for a step-change improvement in postlisting mortality, meaning certain trial interventions (such as trials in recipient optimization) may be best viewed in the light of postlisting rather than posttransplant survival. b. Although survival from waitlisting is open to changes in recipient and donor demographics together with changes in listing practices and local and national allocation systems, the greater impact may be that increased scrutiny of postlisting survival may lead to patients being declined for transplantation and/or the "upgrading" of an individual's likelihood of developing complications as has been seen in other surgical fields. (15) 3. Identify and study subgroups with particularly poor short-term survival.
• This would have the advantage of allowing us to look for step-change improvement but must be balanced by how much that restricts the potential pool of participants and what effect that will have on recruitment rates and trial feasibility as well as the validity of extrapolation to less-sick cohorts.

Liver Transplantation: A Victim of Its Own Success?
The numbers of patients required to demonstrate a meaningful improvement in survival means that liver transplantation trials powered off 1-year patient survival as an endpoint are becoming or already have become unfeasible and unaffordable. This means that the liver transplantation community will have to become more collaborative, prioritize studies deemed (inter) nationally to be of importance over those of personal/ local interest, adopt surrogate endpoints that reflect survival and quality of life, and be open to novel trial designs. These considerations and approaches have been pioneered successfully in other fields (eg, oncology, cardiology), which routinely perform large-scale, multicenter studies. Within that there needs to be greater flexibility and understanding from publishers, institutions, and funders about recognition for individual investigators in these large studies rather than an expectation that researchers will have been chief investigator in their own portfolio of smaller studies. Furthermore, entering patients into clinical studies should be the norm, not the exception. The scope of all of those considerations is huge, and we focus on the adoption of surrogates in this review.

condUctinG FUtURe tRiAls in liveR tRAnsplAntAtion
Surrogate endpoints must be clinically meaningful replacements for conventional endpoints such as 1year patient survival and may be used individually or as part of a composite. The selected markers will depend on the nature of the intervention of the proposed trial. We suggest that future trials need to focus on what is important for patients (both the quality and duration of life from the point of the onset of disease/ symptoms, ie, not only after transplantation) and/or the commissioning health care system (use of health care resources and cost). This means that noninferiority studies based on expensive interventions are probably no longer justifiable, that is, the next generation of trials need to be designed in such a way as to show significant efficacy and/or improvements in terms of cost-effectiveness or resource use. Safety is of course a prime concern, but we suggest that these are addressed in preliminary and not in noninferiority studies.
Currently, the mean cost of a pivotal trial across all specialties with placebo or active comparators is $35.1 million. (16) Although trials have historically been largely funded by industry, changes in outcome measures and designs to provide the sort of data that are desirable to patients, the health care system, and clinicians may not coalesce with commercial interests. It may be that commissioning and licensing bodies will need to mandate this sort of analysis and trial design or the public sector may need to contribute more funding to this research.

the ideAl sURRoGAte endpoint FoR liveR tRAnsplAntAtion tRiAls
The ideal surrogate endpoint has a clear mechanistic basis and should have the following characteristics: • Be sufficiently discriminatory to be useable within a clinical trial setting. • Be robustly validated and predictive of the "clinically meaningful" endpoint. • Be widely available, acceptable, and reproducible such that meta-analyses and comparisons of different trials is both possible and reliable. • Have an outcome with the potential to reflect both the hepatocellular and biliary compartments as well as quality of life. • Have a high specificity and sensitivity.
In choosing the ideal surrogate, one has to be aware of the adage "A correlate does not a surrogate make," that is, surrogates that are not in the causal pathway of the disease may give misleading information about clinical efficacy if they are based solely on observation. (17) A classic example of this comes from the early human immunodeficiency virus clinical trials, where the CD4 count was incorrectly used to power studies based on the assumption that this would subsequently have an impact on overall survival. (17) Similarly, in liver transplantation we have to be careful of assuming that beneficial changes in surrogate markers correlates with outcome; for example, note the following: • A large multicenter trial powered to demonstrate a significant drop in hepatocellular injury as measured Review ARticle | 751 by serum aspartate transaminase (AST) within the first 7 days after transplantation did so, but this was found to have no impact on patient or graft survival rates. (7) Although this can be attributed to the fact that despite including 220 patients it was underpowered to show a difference (with both arms having a 1-year patient survival of ~95%), it is likely that differences in early AST do not correlate with a worse 1-year survival rate. • Although the downregulation of lysyl oxidase-like 2 (LOXL2) may correlate with a reduction in fibrosis, (18) we cannot justify powering a trial purely on the assumption that a reduction in LOXL2 levels will then translate to a reduction in cirrhosis or improved survival.
Trials with surrogate endpoints are 2.5-fold to 3fold cheaper than those using a clinical endpoint. (16) This is likely to be particularly important in the field of liver transplantation, where the relatively small number of patients affected makes it less commercially attractive for corporations compared with other fields such as diabetes mellitus or hypertension.

hieRARchY oF sURRoGAtes
There is no overarching ideal surrogate to cover all of the potential trials in liver transplantation (Fig. 1), so surrogates and/or combinations of surrogates will need to be selected according to the aims of the specific trials.
Although there is significant overlap, these surrogates can be broadly categorized as biomarker, clinical, patient-reported, and health care system outcomes (Fig. 2). Although some are robustly validated and correlated against their "clinically meaningful" endpoint, the remainder will need to be so in the coming years using retrospective and prospective data and samples.
Neither ethic review boards nor funders will sanction large studies without any form of interim analysis or proof of principle for continued large-scale recruitment of patients within liver transplantation trials. Given the large numbers of patients required for studies powered on patient survival, this will require the adoption of biologically plausible and validated surrogates from which to power the pilot study. This leaves the question of which metrics could be used to meaningfully power a pilot study to demonstrate efficacy of a proposed intervention.
Whatever the chosen metric, it should have biological plausibility for the proposed intervention and allow timely completion of a study. Here we consider a hypothetical intervention that is believed to have an impact on 1-year post-liver transplantation patient survival and discuss a selection of potential options and approaches for powering a pilot study.
• The selection of a routinely collected parameter such as AST is an attractive option from which to power a study as it is widely available and provides a continuous measure and spread of data. However, as discussed previously, significant changes in AST do not necessarily correlate with 1-year survival. • The next level of complexity of individual markers is to collect them as part of a "score." The combination of serum alanine transaminase or AST, bilirubin, and international normalized ratio have been used to define early allograft dysfunction. (19)(20)(21) In particular, the Model for Early Allograft Function is appealing from which to power a study because it provides a continuous rather than a binary outcome and has been validated against 1-year patient survival in donation after brain death and DCD transplantation. (20,22,23) The recently published 7-day Liver Graft Assessment Following Transplantation is an alternative continuous measure, which is a strong candidate to emerge as the surrogate endpoint of choice because it is highly predictive of 3-month liver allograft failure. (24) • Another approach would be to use a scoring system that looks at global markers of physiology such as the Acute Physiology and Chronic Health Evaluation IV score. This is appealing because it is cheap, provides a spread of data, and is predictive of 1-year patient survival in liver transplantation recipients. (25) • If the benefit of the proposed intervention on patient survival was thought to be through the reduction of a specific complication such as ischemic cholangiopathy, then another option would be to power the study off the incidence of cholangiopathy detected on protocoled imaging. Although this approach will give a binary outcome, which is less appealing than a continuous score, the increased incidence of the proposed endpoint (in this case cholangiopathy) is higher than that of patient death and therefore allows a study to be piloted without requiring huge numbers. • A scoring system looking at complications in a less binary fashion, such as the Comprehensive Complication Index, is potentially more appealing because of the spread of the data that it provides and because it is predictive of 1-year patient survival after liver transplantation. (19) • Finally, reconsideration about whether the prime benefit of the proposed intervention is actually on survival or whether the trial should actually be considered in terms of its impact on other domains, such as health care resources (eg, length of stay) or patient-reported outcomes (eg, quality of life).

stAndARdiZAtion oF endpoints
We have reviewed all the primary and secondary endpoints listed in registered trials in the 3 main clinical trial registries (International Standard Randomised Controlled Trial Number, European Union Clinical Trials Registry, and ClinicalTrials.gov) under "liver transplant" or "liver transplantation." We found that there was significant heterogeneity in the ways that these terms were applied and the definitions that were used. In Supplemental Table 1, we have attempted to offer a standard definition and suggested timepoints for measuring them in order that future trials would allow better comparison between studies (metaanalysis/systematic review). Given the increasing difficulty in conducting trials that are suitably powered to capture changes in survival, these meta-analyses will be increasingly important.
Where biologically appropriate, we also discuss any validation in the literature with regard to the endpoints in terms of patient survival and highlight any limitations we identify, particularly in the use (or abuse) of composite endpoints borrowed from other fields that may not have the same implications in liver transplantation. For example, although "time to composite of biopsy-proven rejection, graft loss, or death" is relevant to immunosuppression trials looking at kidney transplantation, the impact of episodes of rejection on long-term survival rates after liver transplantation is less clear and may be less valid as an endpoint in this context. The relative weighting of these endpoints should be considered carefully because not all endpoints are equal; for instance, acute rejection and death could be regarded as equal in this example when clearly they are not. The relative weighting of these endpoints should be tested in a public and patient involvement exercise prior to commencing the study to ensure that interventions address what is important to patients.
One of the difficulties of using surrogates is deciding how you validate them, particularly in the context of longer term outcome measures such as 5-year survival, where the ever-changing landscape of patient care and survival brings into question their validity. (26) To an extent, we may need to accept that early survival is a surrogate of longer term survival. This may not be the case for all studies such as those looking at the reduction of cardiovascular deaths or at the development of de novo malignancy, which will still require a longer process of validation.
Part of that process of validation can also come from ongoing prospective data collection following the cessation of a trial powered off the success of a surrogate marker. However, to make this cost-effective the data collection will require better integration of information technology into trials, allowing "automatic" rather than "manual" data capture from hospital/ primary care/ national registries using a streamlined data set of routinely collected variables. The validation of biomarkers specifically can be facilitated by the use of biobanks linked to improved registry data to not only provide rapid access to the source material but also the clinical endpoints required for validation.

Conclusions
We are in an era when conducting trials in liver transplantation is more challenging and requires innovative design; increased collaboration; and the adoption of standardized, validated surrogate endpoints. This concept has already been adopted in kidney transplantation with the iBox collaborative study, which is hoped may inform the design of better clinical trials and validate other surrogate endpoints. (27) Our aim should be that every transplant recipient is enrolled in a multiarm, multiphase clinical trial, where ineffective interventions are removed early and promising interventions are interrogated more rigorously. There then also needs to be storage of samples for the purposes of rapidly validating future surrogate biological outcomes and answering subsequent research questions.
Registries need to collect better and more accurate data that are interpreted with a great level of insight powered by artificial intelligence to allow more accurate characterization of donors, recipients, grafts, and the process of transplantation so that we can attempt to standardize fundamental upstream definitions, such as "What are extended criteria donors?" This process will require international consensus about what the important questions are, how we are going to answer these questions, and how are we going to define a successful trial. Part of this will require the consistent use of validated endpoints that are measured according to a protocoled, consistent manner/ definition, making them open to meta-analysis. It is also essential that outcome switching of surrogate markers from the registered protocol is viewed with the appropriate level of skepticism by publishing journals and other bodies.
In summary, clinicians, academics, funders, professional bodies, patient groups, industry, and regulatory bodies need to come together to do the following: • Agree and adhere to acceptable validated and clearly defined endpoints that are important to both patients and the health care system. We have attempted to provide an initial framework for this (Supporting Table 1), but this will need updating regularly as new surrogates emerge and others are validated. • Recognize the contributions of individuals who work on larger collaborative research projects to incentivize their active participation in larger definitive trials rather than encouraging numerous individuals all leading their own smaller study powered off a less clinically meaningful endpoint. • Lead investigators should ringfence areas within trial frameworks where individuals can take a lead (Studies Within A Trial [SWAT]) and pursue their individual academic interests. • Integrate information technology better into trials to allow automatic data capture from patients/ hospitals/primary care to improve the data in national registries and facilitate ongoing, costeffective prospective data capture to allow studies to validate the utilized surrogate endpoint with longer term survival data, that is, the study should not necessarily close when the trial finishes. • Encourage bigger, better conducted collaborative multiarm, multiphase studies, with every patient offered the opportunity to participate in a trial.
These aims will require fair funding structures using both industry and public sector money across international borders to reflect both commercial and public interests/priorities. These actions are required to ensure that liver transplantation does not become a victim of its own success and allow meaningful clinical trials to continue with outcomes that demonstrate genuine benefit to patients.