What Effect Is Really Being Measured? An Alternative Explanation of Paradoxical Phenomena in Studies of Osteoarthritis Progression


  • Yuqing Zhang,

    Corresponding author
    1. Boston University School of Medicine, Boston, Massachusetts
    • Boston University School of Medicine, 650 Albany Street, Clinical Epidemiology Unit, Suite X-200, Boston, MA 02118. E-mail: yuqing@bu.edu

    Search for more papers by this author
  • Tuhina Neogi,

    1. Boston University School of Medicine, Boston, Massachusetts
    Search for more papers by this author
  • David Hunter,

    1. Kolling Institute, University of Sydney and Royal North Shore Hospital, Sydney, New South Wales, Australia
    Search for more papers by this author
  • Frank Roemer,

    1. Boston University School of Medicine, Boston, Massachusetts
    2. University of Erlangen-Nuremberg, Erlangen, Germany
    Search for more papers by this author
    • Dr. Roemer has received consultant fees, speaking fees, and/or honoraria (less than $10,000 each) from the NIH and Merck Serono, and owns stock and/or holds stock options in Boston Imaging Core Lab.

  • Jingbo Niu

    1. Boston University School of Medicine, Boston, Massachusetts
    Search for more papers by this author


Many observational studies have found that risk factors for incidence of a disease may differ from risk factors for its sequelae (i.e., recurrence or progression of the disease). For example, several risk factors (such as female sex or obesity) for incident knee osteoarthritis (OA) are not associated with OA progression, and a few risk factors that increase risk of incident OA are inversely associated with OA progression (such as high bone mineral density and low levels of vitamin C) ([1-4]). The paradoxical phenomena have also been identified in other diseases as well, such as the “obesity paradox,” whereby overweight or obese patients with preexisting cardiovascular disease have improved survival and a lower risk of recurrent major cardiovascular events than normal-weight patients ([5]). Another example is the “lipid paradox,” in which high levels of total cholesterol or low-density lipoprotein are associated with increased risk of cardiovascular disease among the general population ([6]), but are not, or even inversely, associated with risk of cardiovascular disease among patients with rheumatic diseases ([7]).

While various explanations have been proposed for such paradoxical phenomena, including that risk factors for incident OA may be biologically different from those for OA progression or that findings from studies of risk factors for disease progression may be susceptible to selection bias (e.g., collider stratification bias) ([8]), we proposed that such paradoxical phenomena can be due to lack of clarity in the research question. Specifically, we posited that lack of clarity regarding the types of effects that can be determined from a study, i.e., total, direct, and indirect effects, may constitute a major reason why research findings for risk factors of OA progression appear paradoxical.

In this work, we demonstrated the importance of carefully considering the actual intent of the research question in OA studies and ensuring that the study design allows the particular question to be answered. In the first section, we used a hypothetical randomized clinical trial (RCT) as a prototype to answer a well-defined research question and emphasize the critical characteristics of an RCT that enable one to make valid inferences. In the second section, we illustrated how an observational study can lead to incorrect inferences in the context of an ill-defined research question. We demonstrated that under such circumstances, the findings and inference may not be relevant to the investigators' intended research question and would fail to provide insight into the prevention and treatment of disease, even if appropriate analytic methods were used. Finally, we offered a few suggestions that may help avoid such potential bias.

A hypothetical RCT as the prototype to assess efficacy of bisphosphonate therapy on risk of progression of radiographic osteoarthritis (ROA) of the knee

We begin with the premise that an RCT is the gold standard to assess the efficacy of a specific intervention on an outcome. We use a hypothetical RCT to demonstrate that several distinguishing characteristics of RCTs allow investigators to make valid and relevant causal inferences based on a well-defined research question.

In this example, we propose to evaluate the efficacy of bisphosphonate use on progression of ROA of the knee among subjects with preexisting knee ROA. Let's assume for illustrative purposes that bisphosphonate use reduces risk of progression by decreasing the occurrence or size of bone marrow lesions (BMLs). Eligible subjects consist of those who have preexisting mild or moderate ROA (i.e., Kellgren/Lawrence [K/L] grade 2 or 3 in at least 1 knee) and who are not current bisphosphonate users. Enrolled subjects are randomly allocated into either the intervention group (receiving a particular bisphosphonate) or the comparison group (placebo). Knee radiographs and magnetic resonance images are obtained for all subjects at baseline, in the middle of the trial, and at the end of the trial. The presence and size of BMLs at each time point are assessed using a validated scoring system and the radiographic severity of knee OA is assessed using K/L criteria. An increase in K/L grade is considered to be progression of knee ROA.

The trial can be depicted using a causal diagram (Figure 1). A causal diagram consists of a set of variables: exposure, confounder(s) (which affect both the exposure and outcome), mediator(s) (which represent the mechanism by which the exposure may have an effect on the outcome), and outcome. The relationship of each variable to one another is depicted by arrows that indicate the direction of effect. In this example, “X” represents bisphosphonate use (intervention or exposure), “M1 represents BMLs (mediator), “CF” represents baseline ROA, and “Y” represents knee ROA progression (outcome). In this example, the arrow from X to M1 (X→ M1) indicates that bisphosphonate use has an effect on BMLs; M1→ Y indicates that BMLs affect ROA progression. Lack of an arrow between X and CF indicates that baseline ROA is evenly distributed between intervention groups owing to randomization. To simplify the figure we have excluded other potential confounders and assume that they are appropriately controlled through either randomization or statistical methods.

Figure 1.

Causal diagram of a randomized trial depicting assessment of the effect of bisphosphonate (X) on radiographic osteoarthritis (ROA) progression (Y) and their relationship with bone marrow lesions (BMLs; M1), other unspecified mechanisms, and baseline ROA (CF).

Generally, RCTs assess the total effect of an intervention on an outcome, regardless of the mechanism by which that effect occurs. The total effect refers to all possible causal pathways by which the intervention can have an effect on the outcome, and can be further decomposed into 2 parts based on the mediating factor of interest: the direct and indirect effects. As shown in Figure 1, the “total effect” of bisphosphonate use on knee ROA progression is the net effect through all causal pathways, i.e., either through BMLs (XM1Y) or other pathways (XY); the direct effect of bisphosphonate use refers to the causal pathway by which bisphosphonate use has an effect on ROA progression not mediated through BMLs (XY), and the indirect effect represents the effect of bisphosphonate use that operates through BMLs (X→ M1→ Y). Both direct and indirect effects can be estimated using appropriate statistical methods ([9, 10]).

Each of the 3 effect measures (total, direct, and indirect) has a different interpretation and is not directly comparable with the others ([11]). For example, if the total effect of bisphosphonate use on knee ROA progression is entirely mediated through BMLs, the direct effect of bisphosphonate use on knee ROA progression will be null. In such a case 2 different conclusions can be made depending upon the research questions asked and analysis conducted. In one instance, the total effect of bisphosphonate use may be useful if one is interested in whether bisphosphonate use has effect on ROA progression regardless of the underlying mechanisms. On the other hand, the direct effect of bisphosphonate use is null if one aims to determine whether there are any effects of bisphosphonate use other than that which is mediated through BMLs. Nevertheless, no one would interpret the discrepancy between the total effect and the direct effect as a paradoxical phenomenon since the intent of the 2 questions and analyses are clearly different.

We conducted a simulation study to demonstrate these differences in direct, indirect, and total effects, using the approach proposed by Lange and colleagues ([9]). Using the causal diagram depicted in Figure 1, we assume that 50% of subjects take bisphosphonates, 40% of subjects' knees have a K/L grade of 3 (the remaining 60% knees have K/L grade 2), and there is no association between bisphosphonate use and K/L score at baseline owing to randomization. In addition, we assume that the effect (measured by the odds ratio [OR]) of bisphosphonate use on BMLs is 0.29, the effect of BMLs on ROA progression is 4, and the effects of K/L score on BMLs and on ROA progression are 2, respectively. To simplify the example, we also assume that there is no direct effect of bisphosphonate use on ROA progression (OR 1.0). Therefore, one would expect that the total effect of bisphosphonate on ROA progression should be equal to its indirect effect. We simulated 5,000 independent observations and computed estimates of direct, indirect, and total effects, respectively, based on 5,000 replications. The total effect of bisphosphonate use on ROA progression from the simulation study was OR 0.69 (95% confidence interval [95% CI] 0.61–0.79), the indirect effect was OR 0.69 (95% CI 0.66–0.73), and the direct effect was OR 1.00 (95% CI 0.88–1.14), indicating the noncomparability of the total and direct effects.

Using observational studies to assess total effect of bisphosphonate therapy on risk of progression of knee ROA

In the following example, we demonstrate that restricting a study sample to subjects in a specific stratum (i.e., a particular value) of a mediator or adjusting for a mediator in a statistical model results in determination of the direct effect of an exposure rather than its total effect. Suppose we conduct a hypothetical observational study to assess the same question described above: does bisphosphonate use reduce the risk of knee ROA progression among subjects with mild or moderate knee ROA? Eligible subjects consist of those who have K/L grades 2 or 3 in at least 1 knee. All subjects are asked about their use of a bisphosphonate at the baseline visit “X.” Subjects are followed for 5 years. Knee radiographs are taken and assessed using K/L criteria at baseline “M0” and at the end of followup “Y.” This study is depicted as in Figure 2. In the causal diagram, we added a box around M0 (i.e., mild/moderate knee ROA at baseline) indicating that the study is restricted to (or conditioned upon) knees with preexisting ROA (K/L grade 2 or 3), as is typical for studies of ROA progression. In addition, we assume that all stages of ROA progression are observable during the followup. This assumption, however, might not always be true. Depending on timing of followup radiographs and severity of ROA at baseline, not all stages of ROA progression may be observable. For example, when followup time is relatively long, a knee with K/L grade 2 ROA at baseline may have developed K/L grade 4 ROA at the followup visit. That does not mean that this knee did not have K/L grade 3 ROA at some point during the followup period.

Figure 2.

Causal diagram depicting assessment of the effect of bisphosphonate (X) on radiographic osteoarthritis (ROA) progression (Y) and their relationship with mediators of baseline ROA (M0) and bone marrow lesions (BMLs; M1) as well as other unspecified mechanisms. Here the study design involves conditioning upon M0by restricting the study sample to those with a specific value of M0(box); the same issues arise when variable M0 is simply adjusted for in the analyses.

While the intent of the research question is to assess the total effect of bisphosphonate use on ROA progression, this study design (which is commonly used in observational studies of ROA progression) might actually assess the direct effect of bisphosphonate use depending upon how and when bisphosphonate use was defined. As shown in Figure 2, bisphosphonate use assessed at baseline may have occurred prior to the initial occurrence of ROA. Therefore, bisphosphonate use at baseline could have also been a risk factor for mild/moderate knee ROA at baseline. Since mild/moderate ROA is an intermediate stage prior to progression to later stages of ROA, mild/moderate ROA status at this study's baseline is a mediator in the causal pathway between bisphosphonate use and progression of ROA. Therefore, the research question we are answering based on this causal diagram is whether bisphosphonate use is associated with risk of ROA progression that is not mediated through already having mild/moderate ROA (i.e., the direct effect), rather than its total effect among persons with mild or moderate ROA as in the question addressed by the RCT above. One would not expect such a direct effect to exist since our current understanding of ROA would suggest that a knee must pass through a mild/moderate stage of ROA before it can progress to a later stage.

In general, the main goal of both the RCT and observational study is to evaluate the total effect of an exposure, the temporality of the intervention, mediators, and that outcome in the RCT is clearly defined; therefore, one would not equate the total effect with the direct effect in an RCT. However, in observational studies, time sequence of exposure, mediators, and outcome may not be very clear. Consequently, what may be considered a paradoxical finding of bisphosphonate use being protective of incident ROA (total effect) but not of progressive ROA (direct effect) may not be a paradox at all, but rather should be expected because each set of results answers a completely different research question.

Similar issues arise when comparing the relative importance of various risk factors, such as structural lesions, to the progression of OA or its sequelae ([12-16]). For example, suppose we are interested in testing the hypothesis that meniscal tears increase knee pain severity (i.e., total effect of meniscal tears). Since little is known about the natural history or time course of the occurrence of various structural lesions in OA, let's assume for the purpose of this illustration that meniscal tear is an early structural lesion that leads to subsequent BMLs. By adjusting for BMLs in the regression model when assessing the association between meniscal tear and knee pain severity, one actually obtains the direct effect of meniscal tear on knee pain severity that is not mediated through BMLs, which answers a different question than intended. Furthermore, adding all structural lesions into the same multivariable regression model will preclude comparisons of the effect estimates for each structural lesion. In the above scenario, the effect estimate for the structural lesion that occurs immediately prior to the outcome (BMLs) represents its total effect, whereas the effect estimate for the structural lesion that is the cause of the later structural lesion (meniscal tear) represents its direct effect. Any attempt to compare the magnitudes of these effects will lead to inherently flawed inferences.

Steps to avoid such potential selection bias

Several steps can be taken to avoid the aforementioned problems. First, a well-formed and testable hypothesis should be developed before any design and/or analysis is conducted. Therefore, the intent of the research question and its corresponding effect measures (total, direct, or indirect effects) should be clearly defined. Second, in general one should consider collecting incident instead of prevalent exposure (i.e., new user or initiation of exposure), which ensures that the exposure was present prior to the occurrence of any mediators and the outcome, and that any confounders preceded the occurrence of the exposure. Nevertheless, when the goal is to assess the total effect of chronic risk factors or largely time-invariant risk factors such as body mass index, bone mineral density, or genetic factors, on the risk of ROA progression among the knees with an intermediate stage of ROA, it is not often feasible because the effects of these chronic factors on risk of ROA progression are likely to be blocked by the intermediate stage of ROA. Third, an appropriate study design and analytic method should be implemented to allow testing of the specific research question that was intended. This is particularly critical if one is interested in assessing the direct and indirect effects (i.e., mediating effects) since potential selection bias (e.g., collider stratification bias) is likely to occur, leading to biased effect estimates ([9, 10]). Finally, more studies are needed to better understand the natural history of OA development (i.e., to provide insight into the time sequence of structural lesions and avoid adjusting wrongly for mediators). Without such knowledge, it will be very difficult to assess the total effect of a specific structural lesion on OA progression or its sequelae. Given that such knowledge is often unavailable, investigators may consider constructing plausible causal diagrams, performing sensitivity analyses under various causal assumptions, and examining whether the results change materially under the different assumptions.

In summary, understanding the intent of the research question (i.e., examination of the total, direct, or indirect effects) will help guide implementation of an appropriate study design and analytic approach. Clarity regarding effects obtained from a study as representing total, direct, or indirect effects will provide insights into apparently paradoxical phenomena that may not be paradoxical at all.


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Zhang had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Zhang, Neogi, Hunter, Roemer, Niu.

Acquisition of data. Zhang, Neogi.

Analysis and interpretation of data. Zhang, Neogi, Roemer, Niu.


We thank Drs. Hyon Choi, Maureen Dubreuil, Ada Man, Uyea-Sa D. T. Nguyen, Erik Skovenborg, Barton Wise, and Ms Elizabeth Lincoln for their helpful comments.