Comments from the Editors
The need for better clinical trials†
Article first published online: 16 APR 2008
Copyright © 2008 American Association for the Study of Liver Diseases
Volume 48, Issue 1, pages 1–3, July 2008
How to Cite
Kamath, P. S. (2008), The need for better clinical trials. Hepatology, 48: 1–3. doi: 10.1002/hep.22373
Potential conflict of interest: Nothing to report.
- Issue published online: 20 JUN 2008
- Article first published online: 16 APR 2008
- Accepted manuscript online: 16 APR 2008 12:00AM EST
New treatments in clinical medicine are accepted if they are based on the results of well-conducted randomized control trials (RCTs). Unfortunately, there are few such trials which help us determine the optimal treatment of complications of cirrhosis such as hepatic encephalopathy or hepatorenal syndrome. As hepatologists, we should strive to reach a point where every intervention that we carry out in patients with cirrhosis is supported by high-quality evidence that answers questions which matter to patients.
Most valuable are RCTs that evaluate interventions that have potentially important treatment effects. Effects are important when patients consider them worth the burdens and potential harms involved. In relative terms, such treatments usually reduce the risk of death, severe events, or large declines in quality of life by 25%-50%. Perhaps most relevant are the reductions in absolute terms defined by a time frame: a reduction in the risk of death of 10% in 1 year means more to patients than the same reduction over 10 years.
Because patients and their clinicians will need to balance the potential benefits with the burdens (cost, inconvenience) and potential harms, RCTs should also provide evidence related to the safety of the intervention. An early review of results can demonstrate efficacy; but infrequent side effects and those that result from accumulating exposure to the intervention may only be demonstrated by a longer follow-up. Therefore, the studies should not only be adequately powered a priori to detect a statistically significant benefit, but the study should be for a long enough duration so that important side effects of the treatment will be apparent.1
Why are most clinical trials too small? Although there may be differences in expectations about the treatment effect between enthusiastic and skeptical investigators,2 the most common reason for small trials is economics. There are limits to the resources (money, personnel, available eligible patients) available for the conduct of large trials, so researchers settle for small ones. However, Institutional Review Boards and funding agencies find it difficult to accept this reality (and the need to support multicenter large trials or meta-analyses of multiple small ones) which compels investigators to postulate unbelievably large treatment effects in order to justify conducting a small trial.
So what do investigators do to justify these large effects? One common approach is to decrease the number of patients required by using composite endpoints.3 The use of composite endpoints increases the event rate (patients qualify as having an event by having any of the component outcomes of the composite endpoint) at the expense of the interpretation of the results. When a large number of cardiovascular studies with composite endpoints were scrutinized, the endpoints of moderate and minor importance to patients had higher event rates, whereas endpoints of fatal or critical outcomes were small. These studies were interpreted as showing a benefit of the treatment across both important and less important outcomes when, in fact, there were no meaningful differences in the important endpoints. Some of the statistical endpoints may be biological variables which patients might find clinically unimportant. For example, a trial on a new treatment of hepatic encephalopathy may have as composite endpoints: mortality, reduction in hospitalization episodes, costs and quality of life, improvement in psychometric tests, and blood ammonia reduction. The improvement in psychometric tests and reduction in blood ammonia may be large enough to demonstrate a statistically significant effect of the new treatment even if there is no reduction in mortality, hospitalization episodes, or costs and quality of life. The conclusion of the study, however, would be that the new treatment showed significant benefit in composite endpoints which include mortality, hospitalization, and costs and quality of life. The conclusion, while statistically accurate, is misleading because the new treatment is not effective in the eyes of the clinician or the patient. Thus, it is essential that composite endpoints be weighted. In patients with cirrhosis, death would be the strongest endpoint, and minor changes in laboratory tests would be the weakest endpoint (Table 1). Without such weighting, misinterpretation of composite endpoints could lead clinicians to overestimate the benefit of treatment.
|I Death||II Critical||III Major||IV Moderate||V Minor|
|Varices||Mortality||Reduction in bleeding episodes||Reduction in resources (blood products, endoscopy).||HVPG||Endoscopic signs|
|Quality of life.|
|Ascites||Mortality||Reduction in requirement for paracentesis||Costs, and quality of life||Renal function||Renin activity|
|Hepatic encephalopathy||Mortality||Reduction in hospitalization for hepatic encephalopathy||Costs, and quality of life||Psychometric tests, asterixis||Ammonia EEG|
The duration of the clinical trial is equally critical. Many RCTs that significantly affect the way we practice were terminated prematurely because of apparent benefit.4 Typically, RCTs that are stopped early for benefit are industry-funded and are, unfortunately, becoming increasingly more common. More importantly, such trials show implausibly large treatment effects because the number of “events” is small. Clinicians should, therefore, review the results of RCTs that are stopped early with considerable skepticism, especially when the new treatment is expensive. There are also important ethical issues about stopping RCTs early.5 Because these studies demonstrate large treatment benefits which may not actually be accurate, patients become desirous of a treatment which may not be effective. Moreover, studies that are stopped early yield limited data which are important to patients, such as survival, quality of life, and adverse effects of treatment. Another disadvantage of a truncated trial is that the large benefits demonstrated make it much more difficult for investigators to enroll patients to readdress the efficacy of the treatment in a larger study.
An argument often used in support of stopping a study early is that the larger patient community can benefit from the new treatment. In truth, this is not usually the case because dissemination of reports of the study is usually delayed. In fact, if the study is a two-treatment trial, continuing the study ensures that at least 50% of the patients in the trial are likely to receive the experimental (and supposedly beneficial) treatment. However, if the trial is stopped early, the number of patients who will receive the new treatment because information has been disseminated rapidly is typically considerably less than the 50% of patients in the study.5
The reason for RCTs being truncated is that stopping a study early benefits several parties. Because the early termination demonstrates a high treatment benefit effect, albeit implausibly high, reports of such studies are likely to be published in the more prestigious journals. Such publications considerably enhance the careers of the investigators. Pharmaceutical industries benefit because the cost of the study is decreased and their market share of sales increases.3–5
Studies to determine optimal treatments in patients with chronic liver disease have often been small. For example, treatment of hepatorenal syndrome, which possibly affects the majority of the 40,000 patients who die as a result of cirrhosis each year within the United States, is based on studies of 20 or fewer patients. Treatment of hepatic encephalopathy with lactulose and/or rifaximin is based on even smaller and poorly conducted studies. There are several reasons for only small studies being carried out. Investigation of complications of chronic liver disease is difficult and expensive. Patients tend to be very sick with multiple complications which require the treating physician to have knowledge not only of hepatology but also nephrology (hepatorenal syndrome), infectious diseases (sepsis), endocrinology (adrenal insufficiency), and critical care (shock, hypoxemia, etc.). Such investigation also requires the investigator to spend long hours outside regular working hours during the week, as well as on weekends. Clearly, research in end-stage liver disease requires sacrifice of quality time to a greater extent than would be required to carry out treatment trials, for example in patients with hepatitis C. Thus, it is not surprising that there is a dearth of young investigators with interest in the field of end-stage liver disease. Further, conducting such trials requires a large number of patients, and single centers are not likely to be able to enter such numbers.
Moreover, industry is not typically supportive of large studies because of the expense involved. In studies addressing variceal bleeding, high-quality trials, adequately powered and with appropriate endpoints, have been supported by the National Institutes of Health.6 In contrast, industry-supported studies on somatostatin analogs in the control of variceal bleeding have been terminated when efficacy of the investigational drug was not demonstrated. Therefore, there is the need for large studies in patients with cirrhosis, and such studies should ideally be supported by not-for-profit bodies whose only goals are determining what is in the best interest of patients, and furthering the science.
A proposal for future studies in patients with cirrhosis, therefore, is that authors clearly report the sample size calculation and identify the primary outcome measure. The outcome measure should be important to patients (if it would be the only outcome that changes, would patients consider taking the intervention?), and it should be specified whether the differences between the two treatment arms being studied are absolute or are relative. The statistical basis for the power size calculation should also be reported. The committee that approves the protocol should include both enthusiasts and skeptics of the proposed newer treatment. This will ensure a realistic sample size and more clear demonstration of treatment efficacy.
If a study's treatment benefit far outweighs the adverse effects, one could make a case to terminate a study early. Several suggestions, most of which are only supported by empirical evidence, have been made regarding rules for stopping a trial early for benefit.4 Investigators should set a rigorous P value threshold (<0.001). Second, a large number of events should accrue before investigators examine interim data. In cardiovascular and cancer trials, this typically requires 200-400 events; in patients with cirrhosis, similar numbers are probably necessary. The trend in benefit should be persistent and demonstrable at a P value of <0.001 over two additional interim analyses, 3-6 months apart, which means continuing the trial for about 1 year after the first positive interim analysis. It is particularly important that the treatment effect not be overestimated when the treatment may cause considerable harm or is expensive.
We therefore suggest that future studies in patients with cirrhosis be powered to demonstrate an absolute benefit of at least 25% or at least a relative benefit of 50% over standard therapy.7 Composite endpoints should be discouraged, or when assembled, should have a narrow gradient of importance and treatment effects across the components. Key components of the endpoints would include mortality, complications that are being treated, quality of life, and costs. Surrogate endpoints should not be included as primary endpoints unless they have been conclusively demonstrated to link the intervention with the outcome that matters and to capture completely that relationship. The study should be of sufficiently long duration to demonstrate benefits in mortality and absence of significant side effects. If the difference in survival with or without treatment can be measured only in days or a few weeks, and liver transplantation is not an option, the need for intervention should be questioned.
A well-conducted RCT would be possible only if a large number of centers collaborate, as is a well-established practice in cardiology and the inflammatory bowel disease research. The advantages of such a study consortium includes the fact that there would be a large number of senior investigators, including both enthusiasts as well as skeptics for the treatment, and more realistic endpoints and sample size calculations could be expected. Studies conducted by such research bodies are likely to address meaningful endpoints (for patients, this means quality of life and costs; for investigators, this means mechanism of action and markers of benefit) which will ultimately translate into better patient outcomes. Participation of patients in such collaborations will ensure that investigators ask questions that are relevant to patients and not just those that advance the science.
I am indebted to Victor Montori, M.D., for helpful suggestions.