Negative trials in nonalcoholic steatohepatitis: Why they happen and what they teach us


  • Jeanne M. Clark,

    1. Departments of Medicine and Epidemiology, The Johns Hopkins University, Baltimore, Maryland
    Search for more papers by this author
  • Frederick L. Brancati

    Corresponding author
    1. Departments of Medicine and Epidemiology, The Johns Hopkins University, Baltimore, Maryland
    • Welch Center for Prevention, Epidemiology, and Clinical Research, 2024 East Monument St., Suite 2-600, Baltimore, MD 21205
    Search for more papers by this author
    • fax: 410-955-0476


NASH, nonalcoholic steatohepatitis; UDCA, ursodeoxycholic acid; ALT, alanine aminotransferase.

For a variety of reasons, few of us set out to perform negative studies. Compared with positive studies, they are less likely to attract media attention, to point directly toward interesting follow-up research and the funding that comes with it, or to have immediate impact on clinical practice. They are also more difficult to get published. Yet carefully done negative studies, especially negative trials in emerging fields, can yield precious methodologic insights for the researcher committed to long-term progress in clinical research.

Such are the circumstances surrounding the work by K.D. Lindor and colleagues that appears in this issue of HEPATOLOGY.1 In a randomized, double-blind, placebo-controlled trial involving over 100 patients with nonalcoholic steatohepatitis (NASH), they found that treatment with ursodeoxycholic acid (UDCA) for 2 years had no discernable effect on the course of the disease. Faced with such negative findings, practitioners may be excused if they choose to note the results and turn the page. However, clinical researchers interested in NASH should study the paper closely, because it has much to teach the reader who poses the following question: “Why was this trial negative?” The most obvious reason is that there is truly no beneficial effect of UDCA on NASH. This may be true; however, several elements of the design and conduct of the trial may have conspired to mask a truly beneficial effect of UDCA.

First, like most pioneering clinical trials, the study seemed statistically underpowered in retrospect. The a priori power calculation foresaw a fivefold effect on histologic improvement (from 5% to 25%)—quite a tall order. The sample size needed to observe this large effect (n = 130 with valid pre- and posttreatment data) turned out to be unattainable even after 6 years of recruitment at 13 clinics, mainly because of the high dropout rate (25%–35%, depending on the outcome).

Second, there is lingering concern that the primary outcomes—liver enzymes and histology—were not sufficiently precise to detect subtle effects in a relatively small sample of participants. There are several reasons for this. Liver enzymes are notoriously variable within individuals over time. They can also be affected by variables such as length of time from blood draw to testing and storage temperature,2–4 which may have been difficult to control in a trial relying on mailed samples. Moreover, the authors correctly point to the heterogeneity of liver pathology and thus the potential for sampling error as contributing factors to histologic variability in serial biopsies. A final potential source of variability arises from interpretation of biopsy specimens. Reliance on a single pathologist certainly eliminates interreader variability; however, it does not eliminate intrareader variability, which can also be significant.5, 6

Third, the variability in biochemical and histologic end points might have joined with the key eligibility criteria to bias the trial toward a negative result. This phenomenon, which is well described in blood pressure research, is known as “regression to the mean”7, 8 and can be explained as follows. Assume that, in adults with NASH referred to hepatology practices, alanine aminotransferase (ALT) generally runs approximately 1.5 times normal but varies sinusoidally, week-to-week, between “normal” and 3.0 times normal. Eligibility criteria are chosen that set a threshold above which ALT must be to enter the study—in this case 1.5 times normal. This criterion will, by necessity, enrich the study population with subjects who happen to be at the high points of their ALT curves and eliminate subjects who happen to be at the low points. Now, even if you do nothing (i.e., give placebo) and simply recheck 2 years (or maybe even 2 months) later, ALTs in the study population will drop as the enzyme levels of many who were enrolled at 1.7 or 2.7 times normal trend naturally downward toward their expected value of 1.5 times normal. The patterns of aminotransferases in the trial by Lindor and colleagues are certainly consistent with regression to the mean. So is the pattern of γ-glutamyltransferase, which—although not formally used as an eligibility criterion—is itself quite variable within individuals over time and happens to be highly correlated with aspartate aminotransferase and ALT. Significant regression to the mean makes it more difficult to detect drug-related effects by increasing the background rate of “improvement” and so reducing the signal-to-noise ratio.

Finally, every clinical trialist must endure post hoc critiques about drug treatment: Was the dose too low (or too high)? Was the right preparation used? Was the duration of treatment sufficient? Lindor and colleauges suggest that subsequent trials might use a higher dose of UDCA, equivalent to the high doses used for sclerosing cholangitis; however, they do not directly address whether or not low medication adherence might have contributed to the negative findings. Inadequate adherence is always a concern in clinical trials, but especially so here, where the disease is thought to be largely asymptomatic and where a large majority of subjects reported gastrointestinal side effects.

In light of these methodologic considerations, what lessons can be drawn by clinical researchers committed to scientific discovery for the improvement of care for patients with NASH? First, as the authors point out, this study is a vivid reminder of why control groups are needed in treatment studies. Without this essential design feature, the article would have concluded that UDCA was effective in lowering liver enzymes and improving histology. Next, we urgently require a histologic grading scheme for NASH that is valid, precise, and standardizible across research sites. Ideally, the scheme would yield quantitative or semiquantitative scores that could widen the recruitment pool and enhance the statistical power of relatively small trials. An attempt at this is currently being made by the multicenter, National Institutes of Health–funded NASH Clinical Research Consortium.5 Third, the variability of commonly used NASH markers like liver enzymes and hepatic fat content should be fully determined and carefully accounted for in terms of study design, specimen handling, laboratory and imaging techniques, and statistical analysis.

Finally, it will be important for researchers to keep in mind that NASH is not an easy disease to study in terms of recruitment and retention of trial participants, owing in part to its low profile in primary care settings and the requirement of an invasive procedure for diagnosis and follow-up. In contrast, National Institutes of Health–funded multicenter studies of related conditions such as obesity, type 2 diabetes, and high blood pressure are typically able to enroll 150 to 250 subjects per clinic over a 2-year recruitment interval and obtain valid outcome data on over 90% of these subjects 4 to 6 years later. Future trials would certainly benefit from heightened public awareness of NASH and identification of scientifically compelling noninvasive markers. Pending such developments, research networks linking committed hepatology practices will be crucial to make headway against this emerging threat to increasingly overweight populations in the United States, Europe, and around the world.