See footnote after references for the conflict of interest statement for Dr Kanis. Dr Bone has received recent research support and/or has acted as a scientific consultant to the following companies: Amgen, GlaxoSmithKline, Merck, NPS Pharmaceuticals, Nordic Bioscience, Novartis, Pharmacia, Pfizer, Roche, and Schering Plough. All other authors have no conflict of interest
The advent of effective agents for the treatment of osteoporosis has led to the view that placebo-controlled trials to test new agents for efficacy are no longer appropriate. Rather, studies of superiority, equivalence, or non-inferiority have been recommended. Such studies require very large sample sizes, and the burden of osteoporotic fracture in a trial setting is substantially increased. Studies of equivalence cannot be unambiguously interpreted because the variance in effect of active comparator agents is too large in osteoporosis. If fracture studies are required by regulatory agencies, there is still a requirement for placebo-controlled studies, although perhaps of shorter duration than demanded at present.
THERE IS A GROWING debate concerning the desirability of long-term placebo-controlled studies in osteoporosis. The debate arises because of the advent of proven treatments for the prevention of fracture in osteoporosis and the revised Declaration of Helsinki of 1997 articulated by the World Medical Association in 2000.(1, 2) There are two relevant articles in this declaration.
•Article 5: In medical research on human subjects, considerations related to the wellbeing of the human subject should take precedence over the interests of science and society.
•Article 29: The benefits, risks, burdens, and effectiveness of a new method should be tested against those of the best current prophylactic, diagnostic, and therapeutic methods. This does not exclude the use of placebo or no treatment in studies where no proven prophylactic, diagnostic, or therapeutic method exists.
As a consequence, some Ethics Committees in Europe decline authorization of placebo-controlled studies with new agents in osteoporosis. It has been traditional to undertake trials of new agents with calcium with or without vitamin D in both placebo and test arms, that is, an active comparator design, but such studies have also been refused because other agents are now available that have significant benefits over the use of calcium alone. The wide-spread concerns raised(3, 4) have led the World Medical Assembly to publish a note of clarification on the interpretation of Article 29(5):
The World Medical Assembly is concerned that paragraph 29 of the revised Declaration of Helsinki (October 2000) has led to diverse interpretations and possible confusion. It hereby affirms its position that extreme care must be taken in making use of a placebo-controlled trial and that in general this methodology should only be used in the absence of existing proven therapies. However, a placebo-controlled trial may be ethically acceptable, even if proven therapy is available, under the following circumstances: where for compelling and scientifically sound methodological reason, its use is necessary to determine the efficacy or safety of a prophylactic, diagnostic, or therapeutic method, or where a prophylactic, diagnostic, or therapeutic method is being investigated for a minor condition, and the patients who receive placebo will not be subject to any additional risk of serious or irreversible harm.
Against this background, a meeting was recently held in Europe on “the future of drug development in osteoporosis: placebo controlled or equivalence clinical trials?” The aim of the meeting was to determine, within the frame-work of ethical guidelines, the type of studies that could be recommended in osteoporosis, and in particular, to make recommendations concerning modifications of existing European guidelines.(6) The meeting included clinical experts, members of the Committee for Proprietary Medicinal Products (CPMP), and representatives from the pharmaceutical industry. This paper is a synopsis of the major outcomes of the meeting.
CHARACTERISTICS OF OSTEOPOROSIS
Osteoporosis is described as a systemic skeletal disease characterized by low bone mass and microarchitectural deterioration of bone tissue, with a consequent increase in bone fragility and susceptibility to fracture.(7) The clinical manifestations of osteoporosis relate to the fractures that occur. The classical fractures associated with osteoporosis include the distal forearm that occur typically in the 50s, vertebral crush fractures in the 60s, and hip fractures after the age of 70 years. The osteoporotic skeleton is, however, more liable to fracture at many other sites.(8) The incidence of many osteoporotic fractures increases exponentially in both men and women with age.
The significance of osteoporotic fractures lies not only in the morbidity that arises, but also because they are a strong risk factor for other osteoporotic fractures.(9) For example, hospitalization for vertebral fracture is associated with a greater than 10-fold increase in risk of hip fracture within the first 12 months. The risk thereafter decreases but remains higher than average.(10)
The cornerstone for the assessment of osteoporosis arises from its definition, namely the measurement of bone mass. Ideally, assessment should also include a component of bone strength, but at present, techniques such as fractal analysis, high-resolution computed tomography, and nuclear magnetic resonance are not available for widespread clinical use. Assessment of osteoporosis rests heavily, therefore, on the measurement of bone mass.
Osteoporosis is defined in women as a value for bone mineral density (BMD) more than 2.5 SDs below the young adult average value (i.e., a T-score of −2.5 SD or less) and established osteoporosis as a BMD with the same cut-off value, but the presence of one or more fragility fractures.(11, 12) The same absolute value for BMD can be used as a diagnostic threshold in men.(13) The preferred site for diagnostic assessment is the proximal femur using DXA. Whereas measurements at the hip provide diagnostic criteria, fracture risk can be assessed at many sites.(14) For the monitoring of treatment, the lumbar spine is commonly used because the high proportion of cancellous bone is more responsive to many interventions than the predominantly cortical site of the proximal femur.
Whereas these diagnostic criteria were not developed for intervention thresholds but more for descriptive epidemiological studies, the T-score for BMD has been widely used as an intervention threshold(15–18) and an inclusion criterion for drug development. Thus, in trials of treatment for osteoporosis, patients are selected for study based on a reduced BMD, usually a T-score of −2 or −2.5 SD or less and/or the presence of a fragility fracture.(6, 19)
The use of BMD alone to select patients for treatment is problematic. Age is quantitatively a more important risk factor than BMD alone. In addition, other risk factors such as prior fractures, use of corticosteroids, low body mass index (BMI), biochemical markers of bone resorption, and certain diseases associated with osteoporosis contribute to fracture risk independently from BMD. Other putative risk factors such as family history of fracture and smoking require more validation.
CLINICAL TRIALS IN OSTEOPOROSIS
Current guidelines for new drug development in osteoporosis require the demonstration of an antifracture effect on vertebral and/or hip fracture. Studies should be randomized placebo-controlled trials. For vertebral fracture as an end-point, the patients included may be patients with osteoporosis (i.e., a BMD less than or equal to −2.5 SD), patients with established osteoporosis (low BMD and a history of fracture), or patients with nontraumatic fractures. The recommended duration of study is 3 years, and the principal criterion for efficacy is a demonstration of a clinically significant reduction in the number of patients with new vertebral fractures. For studies that examine the effect on hip fracture, the CPMP recommends the inclusion of high-risk patients with a BMD below the threshold for osteoporosis.(6)
Registration of an indication for prevention of osteoporosis follows the demonstration of antifracture efficacy. Where treatment has shown effects on BMD, BMD is considered to be a valid surrogate for studies in prevention. Such studies, undertaken in women immediately after menopause or patients with osteopenia should be undertaken over a 2-year period, again under placebo-controlled double-blind conditions. Where efficacy has been shown for treatment of osteoporosis, an indication can be given for the prevention of osteoporosis on this basis.(6)
Experience from trials in osteoporosis suggest that differences in vertebral fracture risk in the order of 30–50% are observed compared with placebo (Fig. 1). Antifracture efficacy has been shown with the bisphosphonates, raloxifene, and salmon calcitonin in osteoporosis and in established osteoporosis.(20–26) The relative risk reduction seems to be similar in patients with osteoporosis as in those with established osteoporosis, although the impact differs in terms of fractures saved. For example, in established osteoporosis fracture rates at 3 years are in the order of 15% in untreated patients. A treatment that decreased fracture frequency by 50% would yield a number needed to treat to prevent one fracture (NNT) of 13. In osteoporosis without fractures and a fracture risk of 3% in 3 years, the NNT would be 67 for the same degree of efficacy. The distinction is obvious but of importance because many reimbursement authorities in Europe do not reimburse medications for use in osteoporosis, only in established osteoporosis.
Much less information is available for hip fracture than vertebral fracture. The bisphosphonates risedronate and alendronate have been shown to reduce hip fractures in high-risk patients with low bone mass.(27, 28) Epidemiological studies indicate that the bisphosphonate etidronate and calcitonin might have similar activity.(28) The doses required to reduce hip fracture seem to be similar to those that prevent vertebral fractures. Within the context of double-blind controlled studies, neither raloxifene nor intranasal calcitonin have shown significant effects on appendicular fractures, despite effects on vertebral fracture risk.(22, 23) The other agent that has shown efficacy in the prevention of hip fractures is calcium in combination with vitamin D.(29) It is unclear, however, whether such results reflect treatment of osteoporosis or treatment of subclinical osteomalacia caused by poor nutritional status.
CAN PLACEBO-CONTROLLED TRIALS BE AVOIDED
The most powerful argument in favor of placebo-controlled trials (most usually with the administration of calcium and vitamin D in both the test and control arms) is that each trial has its own internal control. This permits the unambiguous demonstration of efficacy where statistically and significant differences are observed between the new treatment and placebo over and above any therapeutic effect of calcium and vitamin D. Sample sizes that range from 200 to 1600 cover the spectrum of established osteoporosis and osteoporosis.(30, 31) The alternative to placebo studies is to compare the effects of a new treatment with a treatment of proven efficacy with the objective to prove superiority, equivalence, or non-inferiority.
The primary objective of active control trials of superiority is to show that the response to the investigational drug is superior to that of the active control, or for a given product, that the dose is superior to another one. Sufficient patients must be included and followed to show with adequate power a statistically significant difference. The number of patients per group is very dependent on the anticipated size of the difference. If the difference between two treatments is expected to be relatively large (e.g., twice as effective as the reference agent), the number of patients per group will not be markedly increased. By contrast, if a new treatment is anticipated to have a 60% effectiveness and to be shown to be superior to one having 50% effectiveness (as might be seen in osteoporosis), the detection of this absolute difference would require sample sizes of 25,000–150,000 patients per group.(30, 32, 33) Such studies are unrealistic.
Equivalence studies have the primary objective of showing that the response of two treatments differs by no more than an amount that is clinically irrelevant. This is usually formalized by specifying an equivalence margin that is the largest difference that can be judged to be insignificant from a medical standpoint. Failure to detect a difference does not imply equivalence, only that the probability of error is small. The principles of the non-inferiority trial has the primary objective of showing that the response to the investigational agent is not clinically inferior to a comparator agent and a one-sided CI of the treatment difference should lie above the prespecified low equivalence margin.(4, 34) For reasons that are reviewed elsewhere,(30) equivalence margins of greater than 10% are inappropriate in the context of osteoporosis. Even with a 10% tolerance zone, sample sizes are large.(20) For example, if the comparator agent reduced fracture rates to 5% over 3 years, a sample size of 32,500 per group would be required for 80% probability that the two drugs were equivalent.
The major problem with trials of equivalence or non-inferiority is that they are not conservative in nature.(35, 36) Flaws in the design or conduct of the trial will tend to bias the result toward a conclusion of equivalence.(36) The best methodological approach is to build in internal validation by incorporating a placebo arm or multiple doses of the investigational agent.(32, 33) The former does not allay the concerns of the WMA, and the latter can only be achieved with very adverse consequences for sample size. In the absence of a placebo group, external validation is the only recourse. This compares prior experience of the reference product as shown in previous studies conducted under similar conditions. The results of phase III placebo-controlled studies in osteoporosis are sufficiently heterogeneous in terms of fracture frequency(30) to not recommend the use of external controls.
A further consideration for equivalence or non-inferiority trials relates to the burden of fracture. For example, in an equivalence study (10% tolerance margin), agents with 50% efficacy that reduced fracture frequency to 5% over 3 years would be associated with more than 3000 fractures in a study of equivalence but only 66 in a placebo-controlled study (Table 1). This difference is so large that it raises ethical issues. From a societal point of view, equivalence or non-inferiority studies offer no ethical advantage over placebo-controlled studies. The consensus view is therefore that placebo-controlled studies will still be necessary from a scientific point of view and are more ethical in a societal perspective.
Table Table 1.. Comparison of Sample Sizes and Fractures Expected in Trials of Equivalence and Placebo-Controlled Studies in Established Vertebral Osteoporosis
MINIMIZATION OF INDIVIDUAL RISK
If placebo-controlled studies are considered to be necessary, there are a number of ways by which, in the light of the WMA declaration, the burden of studies in osteoporosis might be alleviated.
A requisite for drug registration, both in Europe and the United States, is that there should be a bone-specific safety package that examines the effect of the test agent on bone strength. Agents that impair the quality of bone fundamentally alter the relationship between bone mass and strength and can be assessed by a variety of compressive and torsional tests. Such information clearly shows adverse effects of high doses of etidronate as well as fluoride and are now an essential prerequisite for drug registration. The World Health Organization (WHO) and the Food and Drug Administration (FDA) indicate that where the mechanism of action of an agent is well established, a robust preclinical package showing no adverse effect on bone quality will decrease the burden of proof required in fracture studies.(19, 37) Their impact on clinical development in Europe is not yet established.
Economies in phase II
Interest in the use of biochemical indices of bone turnover lies in the fact that they are suppressed by inhibitors of bone resorption within 3–6 months, particularly the indices of bone resorption, whereas the effects on BMD are more gradual. Even with good quality control it has, however, been difficult to show significant dose response effects, in part related to the precision errors and random biological variations.(38) Thus, their use as a surrogate end-point for dose-selection in phase II would require much larger sample sizes, but this might be traded off by the ability to expose patients for shorter intervals. They have not, however, yet replaced BMD tests for definitive dose-finding.
Extrapolation of clinical settings
Inhibitors of bone resorption such as the bisphosphonates, SERMs, and calcitonin seem to decrease vertebral fracture rates by 30–50%, irrespective of the severity of the disease. Thus, the relative risk reduction is comparable in patients with osteoporosis alone or in established osteoporosis. This suggests that individuals at lower risk might be studied and efficacy inferred in established osteoporosis.
Fracture as a surrogate end-point for fractures at other sites
As mentioned, the only direct evidence for agents preventing hip fractures is with the bisphosphonates, and in the frail elderly, the combination of calcium plus vitamin D. In the cases of alendronate and risedronate, the dose used was the same as that used to show efficacy in reducing vertebral fracture risk. In the case of calcitonin, meta-analysis suggests efficacy at vertebral and nonvertebral sites. By contrast, raloxifene has not been shown to decrease nonvertebral fractures. Further information with the use of inhibitors of bone resorption on hip fracture rates are required before the surrogacy of one fracture site to another can be securely adopted.
Integration of fracture sites
At present, the CPMP recognize studies of vertebral osteoporosis or hip fracture, but osteoporotic fractures occur at many other sites. Some possible economies in future studies would be to document all fractures either with equal weighting or weighted according to the morbidity occasioned by the specific fracture type.(8)
Surrogacy of clinical setting
There have been no randomized controlled studies used for registration purposes that have demonstrated significant effects of intervention to reduce clinical fractures in men. There is, however, no reason to suppose that the biology of bone differs fundamentally from women with respect to the pathophysiology of osteoporosis and therefore its response to therapeutic intervention.(13) A similar situation may pertain to corticosteroid-induced osteoporosis, where the available evidence would suggest that the efficacy, at least of bisphosphonates, is comparable with that observed in postmenopausal osteoporosis. There are now several instances where approvals have been given based on changes in BMD that are comparable with those seen in postmenopausal osteoporosis for indications in which fracture efficacy has not been concurrently demonstrated.
Economies in statistical design
Unbalanced randomization can expose less patients to placebo than to the test drug or the reference product groups. A further approach is to shorten the duration of clinical trials from the 3 years currently recommended for a shorter exposure period. Indeed, for many inhibitors of bone resorption antifracture efficacy is shown in the first year of treatment, and the dividends subsequently are significantly less (Fig. 2). Moreover, the risk of further fracture is much higher after the first event and progressively wanes with time.(10) For this reason, protocols might stipulate a 1-year analysis, and thereafter, to stop the clinical trial if data indicate the superiority of the test agent over placebo. This does not necessarily mean that patients given active drugs should not continue to receive treatment to evaluate its long-term effects.
Selection of low-risk individuals
Personal risk of fracture can be minimized by selecting patients at low risk of fracture. As well as substantially increasing sample sizes, it may also increase the risk to individuals. New agents have the potential to induce adverse effects in the exposed population, and targeting patients at low risk could adversely affect the risk-benefit ratio.
There are a number of clinical situations where therapeutic efficacy has not been proven. Examples include a stroke where the risk of hip fracture is high. The risk is particularly high in the first year, and thereafter wanes, although remains consistently higher than average population risks after about 1 year.(39) The same holds true for recent hip fractures. In these contexts, as in the case of postmenopausal osteoporosis, study designs of 1 year or less may be appropriate. It was considered that adequate evidence for antifracture efficacy in such situations might be extrapolated to the more general situation of postmenopausal osteoporosis.
Can we avoid prevention studies
Whereas clinical studies have demonstrated that the doses effective in preventing fractures in osteoporotic patients are also effective in osteopenia, there are some grounds for believing that the doses required for prevention may differ, particularly for hormone replacement treatment (HRT). The lower the risk of patients, the more important it becomes to determine the minimum effective dose. These considerations suggest that studies of prevention, perhaps in comparison with HRT, are still required.
The recommendations below are not the view of the CPMP, but rather are intended as a platform for the working party on efficacy in osteoporosis to consider.
The group considered that placebo-controlled studies are still necessary in osteoporosis. Active comparator studies demand prohibitive sample sizes and carry a high risk of being invalid. Attention to trial designs such as unbalanced randomization and withdrawal of patients after a fracture can help to minimize personal risk.
For essentially similar products, short-term studies over 1 year using BMD as an end-point may be appropriate where previous studies in other contexts have demonstrated efficacy and bone safety. Precedents for such an approach include the registration of the once weekly regimen for alendronate compared with its daily administration.
For some agents shorter exposure times are appropriate, particularly for disorders where the risk of fracture is high immediately after the event (e.g., immobilization, stroke, etc.). In such instances, interventions of 6 months to 1 year may be appropriate. In the case of postmenopausal osteoporosis, the demonstration of antifracture efficacy at 1 year is appropriate provided that patients in the active wing continue medication to assess long-term safety including fracture outcomes.
With regard to surrogacy of fracture outcome, it was concluded that it was possible to extrapolate efficacy demonstrated on vertebral fracture outcomes in established osteoporosis to osteoporosis and vice versa. The extrapolation of efficacy in vertebral fracture risk to efficacy on hip fracture risk requires further experience from clinical trials before this can be recommended.
For some treatments, there is evidence that the effect may persist after stopping treatment. Studies of the offset of activity are important, particularly in the case of short-term studies and should be considered for phase IV.