The emergence of evidence-based medicine over the past several decades has resulted in the identification of many therapies that improve or extend life, as well as the abandonment of others that have been demonstrated to cause unexpected harm. The gold standard of evidence has been the randomized, multigroup, comparison trial designed in its usual form to evaluate the hypothesis that investigational therapy results in outcomes superior to placebo or an active comparator. This study design is implemented by constructing a trial to reject a null hypothesis stating the opposite: that there is no difference between the treatments under study. The probability of the observed or a more extreme result is then computed under the assumption that the null hypothesis—that there is no difference between treatments—is true. The resulting probability represents the P value.
This approach has led to the identification of therapies in many areas of medicine that afford excellent efficacy. For these indications, the likelihood of a new agent demonstrating meaningfully improved outcomes is relatively low. However, existing therapies may nonetheless be suboptimal in terms of side effects, ease of administration, cost, or convenience. In these cases, the vital question is whether the novel therapy is as good as or noninferior to current therapies. Traditional study designs are not intended to answer this question. This has resulted in the emergence of the noninferiority study design.
One of the earliest examples of this was faced in the design of trials to evaluate tenecteplase for treatment of acute myocardial infarction.1 Prior studies had demonstrated considerable efficacy of alteplase, the standard of care at the time, for reduction of 30-day mortality. However, tenecteplase offers easier bolus administration compared to continuous infusion for alteplase. The ASSENT-2 (Assessment of the Safety and Efficacy of a New Thrombolytic-2) study was designed to demonstrate that 30-day mortality of patients treated with tenecteplase would be equivalent.
In this design a noninferiority margin is established a priori. This margin, which if defined on a relative basis usually ranges from 5% to 20%, represents the maximum difference between therapies that will be considered equivalent. Thus, the null hypothesis is that outcomes for treatments under trial differ by more than the noninferiority margin. As a result, in this study design the P value represents the probability that the therapies demonstrate similar results by chance despite true differences in efficacy greater than the noninferiority margin. For ASSENT-2, the noninferiority margin was defined as the lesser of a 1% absolute difference or a 14% relative difference in 30-day mortality. The observed difference in outcomes was 6.18% for tenecteplase vs 6.15% for alteplase, with an upper 95% confidence interval for the difference less than this criteria, affirming the equivalence of the agents.2
This fundamentally different study design carries with it several differences compared with traditional superiority trials. First, the population studied should be similar to those in which the comparator drug has established efficacy in comparison to placebo. This condition has been termed the constancy assumption. Second, larger sample sizes are generally required than for similar superiority trials. In this sense, noninferiority is a statistically more challenging bar to cross. Third, the intention-to-treat principle should not be strictly adhered to. Simply, if the therapies under study are so onerous that adherence is essentially zero, the difference in outcomes between groups may also be near zero. Thus, adequate assessment of adherence to study treatment and analysis and interpretation with this in mind is key, and an on-treatment analysis may be more appropriate. Fourth, establishment of a reasonable noninferiority threshold is critical. Generally, a threshold that is smaller than the effect seen in placebo-controlled trials of the comparator, but as large as clinically tolerable, is favored.3 Of note, although a larger threshold will require smaller sample sizes to demonstrate statistical significance, unreasonably wide noninferiority margins will raise questions of validity. This issue can be particularly problematic if the noninferiority margin is expressed in terms of an absolute difference in outcome rates if the underlying event rates for the comparator treatment are small. For example, even a modest-appearing noninferiority margin of 0.5% absolute difference in event rates may represent a 50% relative increase if the observed event rates with the comparator treatment is only 1%. Practically, noninferiority margins may be developed from meta-analysis of studies of the comparator treatment vs placebo. In response to variable consistency in the reporting of these issues in the scientific literature,4 guidelines have been promulgated that identify critical factors to be disclosed in trial reports.5
Despite these challenges, there has been rapid growth in trials employing the noninferiority design (Figure 1, P < 0.0001). In cardiology, a field with many treatments with established efficacy, fully 35% (6 out of 17) of initial presentations of late-breaking randomized clinical trials at the American College of Cardiology Scientific Sessions in 2011 were noninferiority trials,6 a 3-fold increase over the preceding 5 years (11 out of 89, 12%, P = 0.029).
This rapid growth is very likely to continue and raises important concerns. Chief among them is what has been called biocreep in noninferiority trials.7 In cases where novel therapy A has been demonstrated to be noninferior to existing therapy B, which itself was brought to market based on noninferiority data compared with prior therapy C, the net noninferiority margin for the novel therapy A compared with the original therapy C may be as high as the product of the 2 noninferiority margins and may be unacceptably large. Although biocreep can be averted by directly comparing therapy A vs C, this approach may not be feasible if therapy B has supplanted C in clinical practice. Alternatively, the noninferiority threshold for the trial comparing A vs B can be set to be extremely narrow. However, this will substantially increase the required sample sizes (to more than 10 000 patients in some cases) and therefore increase the logistical complexity, enrollment duration, and cost of the trial. Although a 2010 study by the Government Accountability Office concluded that biocreep was not a major problem at that point,8 the rapid growth in this study design merits ongoing vigilance.
Overall, the emergence of the noninferiority trial design has allowed the development of many novel therapies with distinct advantages over preexisting treatments. Nonetheless, the rapid growth in this trial design raises important concerns meriting caution balanced with optimism.