Considerations for Development of Surrogate Endpoints for Antifracture Efficacy of New Treatments in Osteoporosis: A Perspective


  • Mary L Bouxsein,

    Corresponding author
    1. Orthopedic Biomechanics Laboratory, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
    • Address reprint requests to: Mary L Bouxsein, PhD Orthopedic Biomechanics Laboratory RN 115 Beth Israel Deaconess Medical Center 330 Brookline Avenue Boston, MA 02215, USA
    Search for more papers by this author
  • Pierre D Delmas

    1. INSERM Research Unit 831 and Université of Lyon, Lyon, France
    Search for more papers by this author

  • The authors state that they have no conflicts of interest.

  • Published online on March 3, 2008;


Because of the broad availability of efficacious osteoporosis therapies, conduct of placebo-controlled trials in subjects at high risk for fracture is becoming increasing difficult. Alternative trial designs include placebo-controlled trials in patients at low risk for fracture or active comparator studies, both of which would require enormous sample sizes and associated financial resources. Another more attractive alternative is to develop and validate surrogate endpoints for fracture. In this perspective, we review the concept of surrogate endpoints as it has been developed in other fields of medicine and discuss how it could be applied in clinical trials of osteoporosis. We outline a stepwise approach and possible study designs to qualify a biomarker as a surrogate endpoint in osteoporosis and review the existing data for several potential surrogate endpoints to assess their success in meeting the proposed criteria. Finally, we suggest a research agenda needed to advance the development of biomarkers as surrogate endpoints for fracture in osteoporosis trials. To ensure optimal development and best use of biomarkers to accelerate drug development, continuous dialog among the health professionals, industry, and regulators is of paramount importance.


Approved osteoporosis therapies reduce the risk of vertebral fractures by 40–70%, and some also reduce the risk of nonvertebral fractures by 20–35% and/or hip fracture by 40–50%.math image Although highly efficacious, currently approved therapies do not eliminate fractures entirely. Moreover, compliance with current therapies is low, and thus optimal antifracture efficacy may not be achieved in clinical practice.math image Taken together, these observations indicate that there is a need for new therapies that provide better prevention of fragility fractures, particularly with regard to nonvertebral fractures.


Regulatory agencies currently require 2- or 3-yr trials with fracture as the primary endpoint to show the efficacy of new therapies for osteoporosis.math image Accordingly, the antifracture efficacy of drugs that are currently approved for the treatment of postmenopausal osteoporosis has been established in placebo-controlled trials performed in patients with high to moderate fracture risk, based on prevalent fractures and BMD. The primary endpoint of these trials was either incident vertebral fractures, nonvertebral fractures, or hip fractures over 3 yr.

Given the broad availability of effective drugs to treat osteoporosis, initiation of new placebo-controlled trials for new osteoporosis therapies that enroll moderate to high-risk patients is viewed as unethical in many countries.math image As a result, obtaining ethical committee and patient approval for these types of studies has grown increasingly challenging. Alternatives to the current paradigm for establishing antifracture efficacy of a new therapeutic agent include (1) conducting a placebo-controlled trial in subjects with low risk for fracture, a study design that is subject to criticism regarding whether the results could be extrapolated to patients at high risk for fracture, or (2) conducting a randomized trial comparing the new therapy with a currently approved drug that has shown consistent and robust antifracture efficacy, a so-called “active comparator trial,” with either a noninferiority or superiority design, in patients with moderate to high risk for fracture. In both cases, the primary endpoint would be fracture, and the required sample sizes would be very large, on the order of 6000–30,000 subjects for 3 yr,math image compared with 2000–8000 patients for 3 yr for previous phase III trials. These numbers imply significant costs that may jeopardize the development of new therapeutic agents in osteoporosis.

Another more attractive alternative would be the development and subsequent use of biomarkers that could serve as surrogate endpoints for fracture in clinical trials. Generally, clinical trials that use surrogate endpoints can be conducted faster, cheaper, and more efficiently than those with clinical endpoints, although there are drawbacks to this approach.math image In this perspective, we briefly review the concept of surrogate endpoints as it has been developed in other fields of medicine and discuss how this concept could be applied in clinical trials of osteoporosis.


The ethical and statistical framework for conducting studies to determine the clinical benefits and risks of a treatment are well established. However, in the past two decades, interest in developing methodologies suitable for studying whether a biological parameter might serve as a substitute for a clinical event or outcome in studies testing the efficacy and safety of new therapies has grown markedly.

Until recently, terminology for describing the potential substitution of biological parameters as clinical endpoints was imprecise and inconsistent. In 2001, an NIH working group provided general definitions and recommended standardized terminology that are applicable across diseases and disciplines,math image as follows:

  • Biological marker (biomarker): a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention

  • Clinical endpoint: a characteristic or variable that reflects how a patient feels or functions or how long a patient survives

  • Surrogate endpoint: a biomarker intended to substitute for a clinical endpoint

In some fields, biomarkers are already being used to identify subgroups of patients that respond differently to a particular therapy and to enhance early diagnosis of disease. In addition, there is a strong and growing interest in assessing whether, and under what conditions, biomarkers may be used to guide dose selection in phase II trials and/or substitute for a primary endpoint in phase III trials.math image Indeed, biomarkers have been used as surrogate endpoints in a many areas of medicine, including assessment of viral load in AIDS, glycosylated hemoglobin A1c (HbA1c) for non–insulin-dependent diabetes, and various surrogate endpoints in cardiovascular disease.math image In the cancer field, over one half of recent approvals by the European Medicines Agency (EMEA) have been based on the surrogate endpoint of “response rate” rather than clinical endpoints such as overall, disease-free, or progression-free survival.math image

Although there are no rigid guidelines for validating biomarkers as surrogate endpoints, several principles for evaluating the usefulness of biomarkers have emerged.math image Generally, if biomarkers are to be used as regulatory tools, they must meet technical requirements for accuracy and precision, be consistent with the pathophysiology of the disease (i.e., have biological plausibility), and be associated with the clinical outcome. Supportive studies include epidemiologic evidence that a biomarker is a strong risk factor for the disease under study, as well as confirmation that it is directly modified by the intervention. Moreover, the effects of treatment on the biomarker should explain a substantial proportion of, or be strongly associated with, the effects of treatment on the clinical endpoint. The extent to which a biomarker is appropriate for use as a surrogate endpoint in evaluating a new treatment depends on the degree to which the biomarker can reliably predict the clinical benefit of that therapy compared with standard treatment of care or other approved therapy.

The approach to analyzing the relationship between a biomarker and clinical endpoints generally begins with a simple model characterizing the relationship between treatment, biomarker, and clinical endpoint (Fig. 1). These models characterizing the relationship between treatment, biomarker, and clinical endpoint can be used to quantify the extent to which treatment effects are mediated by the surrogate endpoint. Generally, to be a “valid” surrogate endpoint, the biomarker must be associated with, or predict a change in, the clinical outcome. The two main approaches to statistical evaluation of surrogate endpoints include analysis of single trials and analysis of multiple trials (meta-analytic approaches).

Figure Figure 1.

Models characterizing the relationship between treatment, biomarker (or surrogate endpoint), and clinical outcome. (A) A “perfect surrogate,” where the biomarker mediates all of the effect of the treatment on the clinical outcome. (B) The more likely situation where the biomarker mediates some, but not all, of the effect of the treatment on the clinical endpoint.

Approaches using single trials

In 1989, Prenticemath image proposed the first formal method for testing the statistical validity of a surrogate endpoint using a single trial, which relied on an “all or nothing” criteria in terms of validation. This was followed in the 1990s by the introduction of a graded criterion for validation of surrogate endpoints using single trials, termed the “proportion of treatment effect” (PTE).math image The PTE reflects the proportion of treatment effect that is mediated by the surrogate endpoints and is defined as the ratio of the amount by which a treatment effect on the clinical endpoint is changed after including a surrogate endpoint in the model to the unadjusted treatment effect on the clinical endpoint. Whereas this approach is appealing, it has been criticized because of conceptual and mathematical concerns.math image

Approaches using multiple trials

In the mid- to late 1990s, use of simultaneous analysis of multiple trials to assess the validity of surrogate endpoints was proposed.math image These approaches can accommodate analyses that use both trial-level and individual-level data to determine the association between the surrogate and true endpoint.math image Combining results from multiple studies through meta-analyses may provide a more robust evaluation of a potential surrogate endpoint than using results of single study.math image Furthermore, meta-analyses may be particularly helpful when the available studies evaluate effects of different classes of interventions on biomarker and clinical endpoints.

Other approaches to characterizing the relationship between treatment, biomarker, and clinical endpoint include models that combine multiple biomarkers or repeated measurements of a single biomarker (i.e., “joint models”),math image and most recently, information theory to create a unified framework for surrogate endpoint evaluations.math image Whereas many methods for validation of surrogate endpoints have been proposed and are under study, presently there is lack of consensus regarding the optimal statistical approach to evaluate potential surrogate endpoints.math image It should also be noted that, although statistical approaches are necessary to evaluate surrogate markers, these are not the only factors, because key clinical and biological observations must be considered in a comprehensive validation of a potential surrogate endpoint.math image


For osteoporosis, the clinical endpoint is fracture, because fractures are responsible for the morbidity and excess mortality caused by the disease. Potential biomarkers include image-based assessment of skeletal fragility, as well as circulating levels and urinary excretion of biochemical markers of bone turnover (BTMs). An ideal surrogate endpoint would be a biomarker that explains most of the antifracture efficacy of a given therapeutic agent. The optimal study to validate a surrogate endpoint for fracture would be a randomized, placebo-controlled trial with fracture outcomes in which the biomarker is measured in all subjects at baseline and during treatment (i.e., concurrent with an ongoing phase III trial).

Because most phase III trials in osteoporosis enroll thousands of subjects, depending on the particular biomarker, this approach will require extensive financial resources and relatively long duration. Moreover, this approach would “validate” the surrogate endpoint in a drug trial for which there was already a fracture endpoint, thereby negating the need for the fracture surrogate except for future drug development. Because this study is unlikely to be undertaken, below we propose an alternative, step-wise approach to investigate the potential of developing biomarkers that may substitute as surrogate endpoints for fracture in clinical trials.

At the outset it is presumed that all biomarkers under consideration, both biological and imaging-based methods, would meet established standards for accuracy, precision, and reliability. In addition, the biomarker methodology should have established quality control procedures, standardized data acquisition, and analysis, as well as methods for cross-calibration of devices at different clinical centers, as appropriate.

Step 1: show biological plausibility (i.e., relationship between biomarker and pathogenetic mechanisms leading to increase skeletal fragility)

The mechanisms underlying skeletal fragility associated with osteoporosis are multifactorial. Thus, a potential biomarker may be associated with various factors associated with skeletal fragility, including abnormalities of bone turnover, decreased bone mass, or alterations in bone micro- and macroarchitecture.math image Studies establishing the biological plausibility of a biomarker in osteoporosis could include clinical observational studies, preclinical studies, and experiments using human cadaveric specimens. The biomarker could reflect aspects of the disease process and/or the severity of the disease state at a given point in time.

Step 2: show significant association between biomarker and fracture in the target population

Two different study designs could be used to show a significant association between the biomarker and fracture risk in the target population: (1) cross-sectional, case-control study or (2) a prospective longitudinal study. A typical cross-sectional study would compare values of the biomarker in postmenopausal women with one or more vertebral fractures to values in an age-matched control group of women with no prior history of fracture. However, the biomarker may be influenced by the fracture event, and moreover, case-control studies cannot control for all potential confounders. For example, BMD may decline in individuals with fracture because of reduced activity after the fracture and therefore the assessment of fracture cases versus controls will be biased in favor of low BMD being associated with fracture. Thus, longitudinal study designs are preferred.

In a longitudinal study, the biomarker is measured before the fracture occurs. The most common study designs include (1) a prospective cohort, in which the biomarker is measured in all individuals at baseline who are then followed prospectively for fracture, and (2) a nested case-control study within a prospective cohort, in which the biomarker (which was collected at baseline) is subsequently measured in all or a subset of individuals in the cohort who suffered a fracture and compared with subjects who did not suffer a fracture in the follow-up period. The prospective cohort design has been used in several large studies (i.e., Study of Osteoporotic Fractures, DUBBO, Rotterdam, OFELY, MrOS) to show that low BMD is associated with fracture.math image In comparison, the nested case-control design has been used to show the association between fracture and bone turnovermath image or radiographically derived indices of bone fragility.math image Although the prospective cohort design provides the most robust assessment of associations with fracture, it requires a large sample size (although this depends on the type of fracture to be studied), several years of follow-up, and that the biomarker be measured in all individuals. In comparison, if the biomarker was collected at baseline (although maybe not analyzed to save costs), a nested case-cohort design is more efficient because the biomarker is measured retrospectively only in a subset of individuals, and because fractures have already occurred, no additional follow-up time is needed.

In conclusion, the time frame for showing a significant association between a new biomarker and fracture risk in untreated patients can vary dramatically depending on availability of biomarker data in existing cohorts with longitudinal follow-up. If biomarker data were acquired at baseline and stored (i.e., serum or urine samples or image data), the time to establish a relationship with fracture risk through a nested case-cohort study is relatively short. In contrast, if new data acquisition in a new cohort is needed, establishing an association with fracture risk can take several years of follow-up to accumulate sufficient number of fracture cases.

Step 3: show that the biomarker changes consistently in response to therapy, preferably in a predictable and dose-dependent fashion that agrees with the known mechanism of action of the therapeutic intervention

To show that the biomarker changes in response to therapy, one could conduct a trial with a study design similar to most phase II trials—a randomized, placebo-controlled trial with multiple doses. The trial could be conducted in subjects with low or moderate risk of fracture, as long as the mechanism of action of the drug does not differ according to the severity of the disease. The study duration could vary with the responsiveness of the biomarker. For example, bone turnover markers generally respond quickly (i.e., within days or weeks) to an intervention, whereas imaging-based techniques may require a study duration of several months to more than a year. Inclusion of multiple doses and other drugs with either similar or different mechanisms of action is desirable, because the ideal biomarker will exhibit a dose–response relationship and will change in a predictable fashion with the known mechanism of action of the intervention.

Step 4: show that changes in the biomarker with treatment explain a substantial proportion of the antifracture efficacy

A few different study designs could be used to show that treatment-induced changes in the biomarker explain a substantial proportion of the antifracture efficacy of a drug. The key elements of these types of studies are that the biomarker is measured in all individuals and that fractures are included as an endpoint. Traditionally, this would be a placebo-controlled study in high-risk patients; however, as mentioned previously, these types of studies are becoming increasingly difficult because of ethical concerns associated with the wide availability of effective therapies. Thus, the alternatives are to conduct an active comparator study in subjects with high risk of fracture or a placebo-controlled trial in subjects with low- to moderate-risk of fracture.

An advantage to the active comparator trial is that there is likely to be adequate number of fractures and therefore good statistical power to show an association between changes in the biomarker and reductions in fracture risk. However, the lack of a placebo group limits these estimates because the proportion of treatment effect cannot be computed without a placebo group.math image Conversely, a placebo-controlled trial in a population with a low to moderate risk of fracture would enroll several thousands of patients, and the analysis of the treatment effect on the biomarker would be adequately analyzed retrospectively in patients with incident fractures compared with a matched subgroup of patients without incident fractures in both treatment groups.


In this section, we attempt to evaluate how well the criteria outlined above have been met for several established techniques, such as BMD by DXA and BTMs, as well as for new potential biomarkers, including vBMD and geometry by QCT, trabecular microarchitecture measurements by MRI and high-resolution pQCT (HR-pQCT), and bone strength estimates by finite element analysis (FEA). For each biomarker, a summary is provided for each step in Tables 1–3, and an explanation is below. This is not meant to be an exhaustive review of the literature but rather a critical assessment of selected studies showing how these biomarkers meet criteria for qualification as surrogate endpoints for fracture. Furthermore, although there are several other techniques that seem promising for fracture risk prediction and treatment monitoring, such as quantitative ultrasound and specialized image analysis of radiographs, a comprehensive review of all possible techniques is beyond the scope of this manuscript.

Table Table 1.. Association Between Biomarker and Fracture Risk in Untreated Patients (Step 2)
original image
Table Table 2.. Change in Biomarker With Treatment (Step 3)
original image
Table Table 3.. Association Between Change in Biomarker and Fracture Reduction With Treatment (Step 4)
original image


Biological plausibility:

Bone loss caused by aging and menopause is believed to contribute to increased skeletal fragility. As such, measurements of bone mass and BMD are key elements in the pathophysiology of osteoporosis and contribution to increased fracture risk. Moreover, numerous studies using human cadaveric specimens have confirmed a strong association between BMD and strength of the proximal femur and vertebral bodies.math image

Association with fracture in untreated patients:

Several large prospective studies have shown a consistent, strong relationship between low BMD and increased fracture risk in both men and women.math image Although BMD at any skeletal site predicts fracture, hip BMD predicts hip fracture more strongly than BMD measurements at other sites.math image

Change with therapy in a predictable, dose-dependent fashion:

The increase in BMD induced by most anti-osteoporosis agents is greater in skeletal sites enriched in trabecular bone than in those with predominant cortical bone. The increase is usually greatest at the lumbar spine, followed by the trochanter, then the total hip, and is limited at the femoral neck. Except for hormone replacement therapy (HRT) and denosumab, there is either no change or a decrease in BMD measured by DXA at the forearm. With antiresorptive agents, one half of the gain of BMD is achieved within 6–12 mo, followed by a slower increase over years. A dose-dependent increase in BMD has been shown at all skeletal sites in numerous studies with oral, transdermal, and intranasal HRTs.math image Data are limited with raloxifene, which produces a small (2%) increase in BMD at the spine, hip, and total body.math image A clear dose-dependent increase in BMD at the spine and hip has been shown with the bisphosphonates alendronate,math image risedronate,math image and ibandronate,math image as well as with strontium ranelate.math image A few studies also showed dose-dependent changes in BMD after PTH treatment.math image

Explain a substantial proportion of antifracture efficacy:

Several studies have examined the association between treatment-related changes in BMD and reduction in fracture risk.math image All these studies showed that the change in BMD during treatment is significantly associated with fracture risk reduction. However, the strength of that association varies with the analytical approach (i.e., meta-analysis versus single trial), fracture type (i.e., vertebral versus nonvertebral versus hip), and therapeutic agent.

For example, although one meta-analysis suggested that much of the reduction in vertebral fracture risk associated with antiresorptive therapy could be explained by increases in BMD,math image other studies reported that <30% of the reduction in vertebral fracture risk after antiresorptive treatment was explained by the increase in BMD.math image Similar variability has been reported for the association between the change in BMD and reduction in nonvertebral fracture risk after antiresorptive therapy.math image

Evidence that the ability of changes in BMD to explain antifracture efficacy may depend on the treatment is provided by a recent analysis showing that, for strontium ranelate, the 3-yr change in either femoral neck or total hip BMD explains ∼75% of the observed reduction in vertebral fracture risk.math image In comparison, changes in BMD after teriparatide treatment explain ∼30–40% of the reduction in vertebral fracture.math image


Biological plausibility:

The activation of bone turnover in postmenopausal women, along with the imbalance in bone remodeling favoring bone resorption over formation, is responsible for accelerated bone loss and deterioration of trabecular architecture, both of which are associated with increased skeletal fragility. BTMs have been shown to reflect the level of bone turnover as measured on iliac crest biopsies and calcium kinetics.

BTMs increase sharply after menopause and then remain stable throughout life.math image In postmenopausal women, BTMs are negatively correlated with BMD measured by DXA regardless of the skeletal site, BTM used, and time elapsed after menopause.math image Although the rate of bone loss assessed by DXA in untreated women requires rigorous quality control and a long follow-up,math image several studies have shown that high BTM levels are significantly associated with subsequent bone loss.math image In summary, there is good evidence for biological plausibility of BTMs as indices of pathophysiology of osteoporosis.

Association with fracture in untreated patients:

Increased BTM levels predict fragility fractures at all sites independently of age, BMD, and prior fractures in postmenopausal women. This association has been assessed prospectively in longitudinal cohort studiesmath image and case-control studies.math image The association with fracture risk is stronger for bone resorption markers than for bone formation markers and is weaker in the frail elderly in whom incident falls is the strongest predictor of fractures.math image

BTMs have been suggested to improve the identification of women at high risk of fracture. Indeed, osteopenic women with high BTM levels have a risk of fracture similar to that of osteoporotic women based on BMD, whereas osteopenic women with normal BTM levels have a fracture risk that is comparable to that of postmenopausal women with normal BMD.math image

Change with therapy in a predictable, dose-dependent fashion:

Antiresorptive drugs rapidly reduce BTMs, reflecting resorption followed by decrease of BTMs, reflecting bone formation. The onset and magnitude of the decrease depends on the route of administration (e.g., faster for intravenous than for oral bisphosphonate) and on the mechanism of action of the antiresorptive agent that influences its potency in inhibiting bone resorption (e.g., greater for denosumab and bisphosphonates than for selective estrogen receptor modulators [SERMs]). A dose-dependent decrease of BTMs has been consistently found for HRT, SERMs, bisphosphonates, and denosumab.math image

The bone-forming agent teriparatide induces a marked increase in all BTMs, reflecting the overall increase in bone turnover seen on bone biopsies.math image The most sensitive BTM reflecting teriparatide effects on bone is serum N-terminal propeptide of type 1 collagen (PINP), which shows an early, large, and sustained increase under treatment that correlates significantly with the subsequent increase in BMD measured by DXA and by QCT.math image Strontium ranelate induces a small decrease of serum C-terminal extension peptide of type 1 collagen (CTX) and a small increase in bone alkaline phosphatase.math image

Explain a substantial proportion of antifracture efficacy:

In retrospective nested case-control studies of phase III trials with fracture as a primary endpoint, the magnitude of the 3- to 12-mo decrease of BTMs has been shown to be significantly associated with the fracture risk reduction in five analyses of trials with raloxifene, alendronate, and risedronate.math image The proportion of treatment effect (PTE) caused by the decrease in BTMs is not clearly established and varies according to the antiresorptive agent. For example, the change in bone resorption markers accounted for 50% of risedronate's effects in reducing vertebral fracture risk in the first year and approximately two thirds over 3 yr.math image This is greater than the PTE because of the increase in BMD after therapy (∼28% for 2-yr change in BMDmath image), but the CIs of these estimates for BTMs are quite large because they were measured only in a subset of the subjects in the trial.

Interestingly, the relationship between BTM changes and fracture risk is similar for placebo and antiresorptive-treated patients, in contrast to the BMD/fracture relationship.math image There are no studies relating BTM changes and fracture risk reduction with strontium ranelate or with teriparatide.

Bone morphology and vBMD by QCT

Biological plausibility:

BMD, bone size, and bone morphology are important determinants of whole bone strength.math image Therefore, it is plausible that QCT-derived measurements of trabecular and cortical vBMD, as well as 3D characteristics of bone morphology, will reflect osteoporosis pathophysiology and disease status. In support of this, numerous studies using human cadaver specimens have shown strong relationships between QCT-derived vBMD and morphology and femoralmath image and vertebral strength.math image

Association with fracture in untreated patients:

Although there are limited prospective studies,math image numerous case-control studies have shown significant differences in QCT-derived vBMD and geometry among individuals with prior vertebral or hip fracture compared with age-matched controls with no prior history of fracture.math image

Change with therapy in a predictable, dose-dependent fashion:

Treatment-induced changes in vBMD and morphology assessed by QCT vary with therapy and skeletal site.math image Teriparatide treatment leads to marked gains in vertebral trabecular BMD that are greater than those observed with alendronate.math image For example, after 18 mo, vertebral trabecular BMD increased 19% versus 3.8% in postmenopausal women treated with teriparatide or alendronate, respectively.math image Treatment-induced changes in vBMD and morphology at the hip are generally smaller in magnitude and results are less consistent than at the spine. For example, teriparatide and alendronate had similar effects on increasing femoral neck trabecular vBMD (∼2–5%), whereas cortical vBMD increased more with alendronate than teriparatide.math image Other studies have shown that, although cortical vBMD at the hip increases more with alendronate than PTH, cortical volume increased to a greater extent with PTH, suggesting the presence of more, but less mineralized bone after PTH compared with alendronate.math image Raloxifene has been shown to induce small increases in vertebral BMD in comparison with placebo.math image There are no studies showing dose-related changes of bone density/geometry induced by any anti-osteoporotic drug.

Explain a substantial proportion of antifracture efficacy:

To date, no clinical trials with fracture as an endpoint have included assessment of geometry or vBMD by QCT, and therefore, no studies have reported the proportion of treatment effect explained by a QCT-based measurement.

Trabecular microarchitecture by MRI or HR-pQCT

Biological plausibility:

Bone loss along with deterioration of trabecular and cortical bone microarchitecture are hallmarks of osteoporosis.math image Trabecular bone architecture deteriorates and cortical thickness declines with increased age, because of the imbalance in bone remodeling that favors bone resorption over bone formation. Clinical studies, comparing subjects with and without fracture, have suggested that microarchitectural deterioration, as assessed by iliac biopsy, contributes to fracture risk independent of bone mass.math image

Association with fracture in untreated patients:

Several case-control studies confirmed that trabecular bone microarchitecture, measured at peripheral skeletal sites either by MRImath image or HR-pQCT,math image differs between fracture cases and controls. There are no prospective studies showing an association between microarchitecture assessed by MR or HR-pQCT and fracture.

Change with therapy in a predictable, dose-dependent fashion:

One study, the QUEST trial of nasal salmon calcitonin, has shown treatment-induced changes in bone microarchitecture in postmenopausal women, as measured in vivo by high-resolution MRI.math image Consistent with the proposed mechanism of action of this antiresorptive, there were minimal changes in the calcitonin group, but decreases in the placebo group, resulting in significant treatment-induced changes in microarchitecture when the two groups were compared. In another study, significant improvements in trabecular microarchitecture, assessed at the distal tibia by high-resolution MRI, were seen after treatment of hypogonal men with testosterone for 2 yr.math image There are no studies showing treatment-related changes in trabecular architecture as measured by HR-pQCT. No studies have shown dose–response relationships for changes in trabecular architecture with treatment.

Explain a substantial proportion of antifracture efficacy:

To date, no clinical trials with fracture as an endpoint have included assessment of trabecular microarchitecture by in vivo methods, and therefore, no studies have reported the proportion of treatment effect explained by a change in microarchitecture.

Bone strength estimates by finite element analysis

Biological plausibility:

The finite element (FE) method was first applied to structural analysis in the 1950s,math image and it has since been widely used in nearly every engineering and engineering-related field because it can provide the ability to estimate how an object with a complex geometrical shape (e.g., a whole bone) behaves when it is subjected to external loads. Current clinical implementation of FEA is generally based on 3D-QCT scans, where each voxel of the CT scan is converted to an element in the finite element model.math image This approach should theoretically be able to represent a bone's 3D geometry and the heterogeneous distribution of BMD and material properties, subject, of course to the limitations association with the resolution of the image data, the assumptions necessary to convert QCT density data to material properties, and the choice of external loads applied to the model.math image Laboratory studies using human cadaveric specimens have shown that predictions of whole bone strength using this approach are strongly correlated with vertebralmath image and femoral strength.math image In summary, QCT-based FEA, because of its ability to reflect bone geometry and bone mass distribution in a biomechanically relevant fashion and its strong association with bone strength in vitro, is considered to have high association with skeletal fragility and disease severity and therefore meets criteria for biological plausibility.

Association with fracture in untreated patients:

Two case-control studies have shown the ability of patient-specific QCT-based FE models of the lumbar spine to discriminate postmenopausal women with vertebral fractures from age-matched controls with no fracture.math image In contrast, a case-control study comparing stiffness of the proximal femur derived from QCT-based FEA in postmenopausal women with recent hip fracture to controls with no fracture showed no differences according to fracture status.math image There are no prospective studies testing the ability of QCT-based FEA to predict fracture risk.

Change with therapy in a predictable, dose-dependent fashion:

There are no studies showing a dose-dependent relationship between treatment and FE-predicted bone strength outcomes. However, two studies have shown changes in FE-predicted bone strength after therapeutic intervention. In the first, QCT-based FE models of the lumbar spine were performed at baseline and after 6 and 18 mo of treatment with teriparatide or alendronate in postmenopausal women.math image Both teriparatide and alendronate were associated with significant increases in vertebral bone strength parameters at 6 mo, although changes in the teriparatide group were greater than alendronate. QCT-based FEA has also been used to evaluate changes in strength parameters of proximal femur in a sideways fall configuration after bisphosphonate or teriparatide treatment.math image

Explain a substantial proportion of antifracture efficacy:

To date, no clinical trials with fracture as an endpoint have included bone strength estimates by FEA as an outcome, and therefore, no studies have reported the proportion of treatment effect explained by a change in FE-based predictions of bone strength.


In this perspective, we argue that there is an urgent need to develop surrogate endpoints for fracture and introduce concepts related to the use of biomarkers as surrogate endpoints in osteoporosis. Whereas many concepts are universal across disease categories, there are several issues specific to the application of surrogate endpoints in osteoporosis that require further consideration.

An important question is whether any biomarker or set of biomarkers will be valid across different classes of drugs, either with the same or differing mechanisms of action. The question is whether, after validation of a biomarker based on an accepted clinical endpoint (i.e., fracture), could the biomarker be used as an endpoint for registration of a new therapy of the same or differing mechanism of action? A key consideration is whether the interaction between the potential surrogate endpoint and the mechanism of action of the drug is well understood. This is particularly important for biological markers, such as bone turnover, which decrease, increase, and are relatively unchanged after antiresorptive, teriparatide, and strontium ranelate interventions, respectively. It is possible that an imaging biomarker, such as prediction of bone strength by FEA, which integrates the material and structural effects of the treatment, may be less sensitive to the biological mechanism of action of the intervention.

It is difficult to evaluate existing data to get a sense of whether currently available biomarkers show potential to be validated across different classes of treatments. Most of the existing studies have examined the ability of BMD and/or bone turnover markers to explain fracture reduction after antiresorptive therapies, mainly oral bisphosphonates. Other antiresorptive agents, such as intravenous bisphosphonates, SERMS, and denosumab, may have similar effects, although this needs to be more thoroughly tested. Moreover, there is limited data evaluating the ability of BMD and bone turnover markers to explain the antifracture efficacy of drugs with anabolic or other mechanisms of action. Thus, it is quite plausible that, if a biomarker were validated for a given bisphosphonate, it could be accepted as a surrogate endpoint for trials of a new bisphosphonate. However, it is less likely that the same validated surrogate endpoint would be easily accepted for a trial of a new anabolic or dual-action agent, although this is not impossible depending on the nature of the biomarker.

A second important question is whether a biomarker could be valid across different fracture types or will a different biomarker be needed for vertebral, nonvertebral, and hip fractures? Because they reflect bone remodeling activity in the entire skeleton, BTMs are more likely to be able to reflect a variety of clinical endpoints than an imaging endpoint, which may have greater skeletal site specificity. Ideally, a combination of biomarkers might provide a surrogate endpoint that would be valid for all fractures.

Furthermore, because of the multifactorial nature of skeletal fragility, it may be that a set of biomarkers, rather than any individual marker, will be more strongly associated with the clinical outcome and therefore the most likely to be qualified as a surrogate endpoint for fracture. As mentioned, an imaging biomarker along with a biological marker, such as bone turnover, may allow assessment of disease severity (e.g., an imaging biomarker) along with disease activity (e.g., a bone turnover marker). Similarly, this combination, for example, of BMD change and BTM change with treatment, might provide a better prediction of the antifracture efficacy of a therapeutic agent than either biomarker alone. This possibility could be easily assessed retrospectively from existing trial data, with the caveat that bone turnover markers were generally measured only in subsets of trial subjects. Complications with this approach for new studies are the obvious expense of designing trials with multiple imaging and biological markers and the lack of fully validated statistical methods by which to evaluate the ability of a set of biomarkers to reflect the clinical endpoint.

Although there is no consensus regarding the criteria that should be achieved for statistical validation of biomarkers as surrogate endpoints, a number of approaches, using data from both single trials and multiple trials, have been developed and tested in other fields. An emerging trend for evaluation of biomarkers as surrogate endpoints is to use meta-analysis of different trials. In this case, it is imperative to include analyses of individual patient data, because analyses using only mean values from different trials can be misleading.math image However, challenges to this approach include the need to obtain patient data that often belongs to competitors in the pharmaceutical industry, as well as the consideration as to whether biomarkers have been acquired and analyzed using a standardized approach in the different trials.

Clearly a formidable research agenda lies ahead to validate a biomarker or set of biomarkers as a surrogate endpoint for fracture in osteoporosis, including technical development and testing of biomarkers in trials, as well as advancement of the statistical methodology for their evaluation. However, as we have outlined, there are already a number of biomarkers available that are promising, although currently there are limited data to assess their true potential to serve as surrogate endpoints for fracture. To address this limitation, several ongoing phase III placebo-controlled trials of new therapeutic agents have included some of these new imaging techniques and bone turnover markers as major endpoints. These studies will provide important information about the ability of these techniques to serve as surrogate markers. As has been done in other fields,math image simulations of trial datasets could be undertaken to probe the utility of different statistical approaches for validation of surrogate endpoints in osteoporosis trials. Additional clinical studies are needed, and sponsors of clinical trials with fractures as endpoints are encouraged to include promising surrogate endpoints as outcomes. Moreover, if meta-analyses are to be undertaken, it is imperative that guidelines for standardized image acquisition and analyses be developed so that data can be combined across trials. Finally, it is time to open discussions with regulatory authorities to present the arguments and strategy for development of surrogate endpoints for fractures in osteoporosis trials. The number of individuals suffering from osteoporosis is growing worldwide, and despite current availability of drugs that reduce fracture risk, there is still a need to develop novel therapeutics that are even more efficacious, are more convenient, and have improved safety and tolerability profiles.


In this perspective, we suggest there is an urgent need to develop surrogate endpoints for fracture and have outlined an approach to validate the use of biomarkers as surrogate endpoints in osteoporosis. To ensure optimal development and best use of biomarkers to accelerate drug development, continuous dialog among the health professionals, industry, and regulators is of paramount importance. Furthermore, completion of intermediate steps to validate a biomarker as a surrogate endpoint for fracture may also have the benefit of showing the biomarker's use in other clinical areas, such as improved identification of patients at highest risk for fracture and enhanced monitoring of treatment response.


Although the ideas presented in this perspective represent personal opinions of the authors, the authors thank the informative discussions held during the NIH-sponsored workshop between industry and academic investigators in December 2005, which provided the inspiration for this perspective. In particular, we acknowledge Dr Gayle Lester of the National Institute of Arthritis and Musculoskeletal Diseases, who organized this meeting. The authors acknowledge funding from NIH AR053986 and INSERM U831.