Defining product bioequivalence
Prior to Hatch-Waxman (1960s–1970s), the focus was on one fundamental assumption: if the drug concentration–time profiles resulting from the two formulations are superimposable, then the safety and the effectiveness of the two formulations will likewise be indistinguishable. Extending this assumption further, there was the basic belief that the primary impact of the formulation was control of drug absorption. Accordingly, so long as the excipient(s) did not have activity in the body, bioequivalence was considered to be an evaluation of the rate and extent of drug absorption.
These early bioequivalence determinations evaluated the extent of drug exposure by comparing the weights of individual subject concentration–time plots that were graphed on and cut from specially designed paper. This was followed in 1971 by a National Academy of Sciences bioequivalence symposium that resulted in recommendations for methods of determining the area under the concentration vs. time curve (AUC) via numerical integration and the evaluation of rate of absorption by assessing the observed peak drug concentrations (Cmax) and the time to Cmax, Tmax (Ronfeld & Benet, 1977). The bounds for defining bioequivalence were set as ±20% based upon consultation with physicians who concluded that this magnitude of difference would be without clinical significance. This determination was first published in the Federal Register in 1977 (Federal Register, 1977).
The next step was determining the statistics to be used for establishing with confidence that the two products differed by no more than ±20%. In the early years (∼1985–1987) of the Division of Bioequivalence, Center for Drug Evaluation and Research (CDER) of the US FDA, the statistical basis for concluding that two products are bioequivalent involved the use of an analysis of variance (anova) to test the null hypothesis (H0) of ‘no difference’ between the average bioavailability of the two products. The results of such an evaluation (the ability to correctly accept or reject H0) are a function of the Type I and Type II error of the test (Table 1).
Table 1. Defining Type I and Type II error. In a bioequivalence trial, we can think of α as the risk of declaring two products as being different when they in fact are bioequivalent; and we can think of β as the risk of declaring two products as being bioequivalent when they in fact are different
| ||Investigator accepts H0||Investigator rejects H0|
|When H0 is true||Valid conclusion||Type I error (α) sponsor risk|
|When H0 is false||Type II error (β) patient’s risk||Valid conclusion|
The Type I error relates to the ‘α’ value [the degree of risk (e.g., α = 0.05)] associated with rejecting the null hypothesis when it is in fact true. This is also referred to as the level of significance. From a pharmaceutical perspective, one can view the Type I error as the level of the sponsor’s risk (i.e., the risk of failing to accurately define a product as being a bioequivalent product). On the other hand, from a patient perspective, there is the need to minimize the risk of failing to identify products that, in fact, are NOT bioequivalent. The probability of failing to reject the null hypothesis when it is in fact false is termed β. For example, a test may not have adequate power (where power = 1 − β) to reject the null hypothesis when it is in false. Power is a function of the variability in the parameter estimate, the number of observations included in the comparison, and the magnitude of differences that the investigator wishes to detect (e.g., 80% power to detect a 20% difference). The risk associated with failing to reject the H0 when it is false is known as the Type II error.
Initially, a two-tiered approach to this analysis was employed:
As can be deduced from these relationships, there were inherent inconsistencies associated with the use of a traditional anova in the assessment of product bioequivalence. In particular, when using the power approach, the likelihood of declaring two products as being bioequivalent increased as the standard error increased (Schuirmann, 1987). In other words, the more variable the data (i.e., the greater the level of uncertainty), the greater the likelihood of declaring two products as being bioequivalent. The upper limit of the bioequivalence boundaries was defined by the 75/75 rule (Fig. 1). Considering the weaknesses in this set of statistical metrics, it is not surprising that the first several years after ratification of the Hatch-Waxman Act was met with much controversy as debates regarding FDA’s ability to insure therapeutic equivalence abounded in the literature (e.g., Hamrell et al., 1987). During these early years, many issues remained to be resolved, from in vitro test methods to statistical procedures for evaluating product bioequivalence (i.e., shortcomings associated with the use of anova methods as described above).
Figure 1. Rejection region when using the power approach (based upon the work of Schuirmann, 1987). represents the difference between the treatment means. Note that as the standard error of the estimate increases, the size of the allowable difference between the treatment means likewise increases up to a maximum difference as defined by the need to achieve no less than an 80% power of the test. [Correction added after online publication 7 July 2010: The x-axis marking was updated from 1 to 0, the x-axis label was changed from T/R to and the legend was described more accurately to reflect this change].
Download figure to PowerPoint
As the statistical principles associated with the power approach were called into question, alternative statistical approaches were suggested (e.g., Hauck & Anderson, 1984). Ultimately, in 1987, Don Schuirmann of the FDA published a landmark manuscript where he split the statistical assessment into two-one-sided test problems and applied the two-sample t-tests for evaluating the upper and lower limits describing bioequivalence (Schuirmann, 1987). In so doing, the H0 was changed to an assessment that the test (T) and reference (R) products are different (i.e., >20% difference between treatment means). The corresponding alternative hypothesis (Ha) is that the two products are bioequivalent (i.e., ≤20% difference between treatment means). By changing the test in this manner, an investigator needed to reject H0 in order to conclude that the two products are bioequivalent. Thus, Schuirmann’s statistical method provided a logical approach to the assessment of product bioequivalence: i.e., the likelihood of declaring two products as bioequivalent improves as statistical power is increased (Fig. 2).
Figure 2. Relationship between allowable difference between treatment means and the corresponding standard error of the estimate for establishing product bioequivalence when using the two one-sided test procedure (confidence interval approach). Note that as the standard error increases, the allowable difference between product means likewise decrease (based upon the work of Schuirmann, 1987). [Correction added after online publication 7 July 2010: The x-axis marking was updated from 1 to 0, the x-axis label was changed from T/R to and the legend was described more accurately to reflect this change].
Download figure to PowerPoint
When comparing the bioavailability of two products using a confidence interval approach, the number of subjects needed for inclusion in a bioequivalence trial will be based upon the following statistical factors:
The targeted bioequivalence bounds (e.g., T/R = 0.80–1.25). The more narrow the bounds, the greater the number of subjects that will be needed to meet the bioequivalence criterion.
The targeted Type I error (e.g., use of the 90% confidence interval where α = 0.05 for the upper and lower bounds). As one decreases the targeted Type I error (e.g., use a 95% confidence interval where α = 0.025 for the upper and lower bounds), the estimated width of the confidence interval will increase. Accordingly, the wider the estimated confidence interval, the more difficult it will be to achieve the targeted bioequivalence bounds (e.g., 0.80–1.25). Accordingly, as the targeted Type I error is reduced, the number of subjects needed to demonstrate product bioequivalence will increase.
The power of the test (which is a function of the risk we are willing to accept that we will accept the null hypothesis when it is in fact false). By reducing the risk of a Type II error (β), we increase the power of the test. If we hold the Type I error at α = 0.05 and if we keep the confidence bounds for declaring bioequivalence at 0.80–1.25, then as the targeted power of the test increases (e.g., aiming for a statistical power of 90% rather than 80%), then, for any given magnitude of variability, the corresponding number of subjects needed for demonstrating product bioequivalence will likewise increase.
During the time that bioequivalence concepts were still evolving within the human pharmaceutical community, the issue of bioequivalence within veterinary medicine was in its infancy. The importance of these concepts to animal health scientists within academia, industry, and government grew rapidly with the advent of the Generic Animal Drug Patent Term Restoration Act of 1988 (GADPTRA). GADPTRA legally allowed for the US FDA Center for Veterinary Medicine (CVM) to approve generic animal drug applications (Public Law 100-670, Nov. 16, 1988, 102 Stat. 3971). In response to this new legislation, CVM released nine policy letters that fleshed out the issues and considerations associated with the regulation of veterinary generic drug products. In 1996, following several workshops on this topic (Martinez & Riviere, 1994) the in vivo bioequivalence test methods and considerations were solidified into the FDA/CVM guidance document #35 of 1996. This guidance has undergone several minor revisions, with the current version having been released in November, 2006 (http://www.fda.gov/downloads/AnimalVeterinary/GuidanceComplianceEnforcement/GuidanceforIndustry/ucm052363.pdf).
Furthermore, educational efforts were underway to provide an understanding of the basic statistical and pharmacological considerations associated with product bioequivalence to the veterinary community (Toutain & Koritz, 1997). These efforts reflected the need to translate the issues and methodologies associated with the determination of product bioequivalence from the human to the veterinary patient. Currently, these basic bioequivalence concepts and methods of data analysis are accepted within veterinary medicine and are being applied to both generic product approvals and bridging studies associated with new and investigational new drug evaluations. However, marked changes have occurred within our therapeutic landscape. This includes the development of novel release technologies (e.g., Martinez et al., 2008, 2010), and a growing awareness of the relationship between the physicochemical characteristics of the active pharmaceutical ingredient (API) and formulation effects, not only as they apply to human pharmaceuticals (e.g., Yu et al., 2002, Dahan et al., 2009) but also to veterinary medications (e.g., Martinez et al., 2002; Fahmy et al., 2008).
Because of recent pharmaceutical advances both within human and veterinary medicine, we now face new and unresolved issues associated with the evaluation of product bioequivalence. While many of these issues are common to both human and veterinary medicine, there are also challenges specific to veterinary drug products. This manuscript highlights the currently unresolved challenges impacting the bioequivalence assessment of veterinary drug products and provides a summary of the associated scientific complexities.