Avoiding common pitfalls in the analysis of observational studies of new treatments for rheumatoid arthritis


  • Marie Hudson,

    Corresponding author
    1. Jewish General Hospital and McGill University, Montreal, Quebec, Canada
    • Centre for Clinical Epidemiology, Jewish General Hospital, 3755 Côte-Sainte-Catherine Road, Montreal, Quebec, Canada H3T 1E2
    Search for more papers by this author
  • Samy Suissa

    1. Jewish General Hospital and McGill University, Montreal, Quebec, Canada
    Search for more papers by this author
    • Dr. Suissa has received honoraria (less than $10,000) from BMS as a member of the scientific advisory board.


New disease-modifying antirheumatic drugs (DMARDs), and in particular biologic agents, have revolutionized the treatment of rheumatoid arthritis (RA) in the last decade and led to significant improvements in patient outcomes. The efficacy of these treatments has been established in well-designed and -conducted randomized clinical trials (RCTs). However, although RCTs produce good evidence of efficacy, they are expensive to conduct, and thus provide data on limited numbers of highly selected patients followed for short durations of time. Therefore, RCTs are often not capable of providing good estimates of harm, which may require long-term followup of large numbers of patients. In response to this deficiency, several drug and disease registries have been established to provide long-term followup on patients exposed to these new treatments. In addition, administrative databases have the additional advantage of large study populations, thus offering the potential to identify rare events (1). Observational (i.e., nonexperimental) studies of registry data and administrative databases are useful sources of information concerning the safety of biologic agents in RA (2, 3).

Bias, defined as “systematic error in the design, conduct or analysis of a study that results in a mistaken estimate of an exposure's effect on the risk of disease” (4), can threaten the validity of epidemiologic studies, including observational studies. Bias can be classified into selection bias, information bias, and confounding (5). The objective of this study was to identify published observational studies reporting on the safety of biologic agents in RA where bias could have arisen, to describe the bias, and to identify possible ways of minimizing the bias. By using published studies to illustrate bias, we wanted to highlight the fact that biases are not only theoretical but can be real. Therefore, our aim was not to perform an exhaustive review of all existing biases, but rather to sensitize readers of the literature to the issue of potential bias in observational studies of the safety of biologic agents, and to do this by illustrating specific biases using concrete examples from the literature.

The Wandering Comparison of Risk

In RCTs, patients are highly selected and randomly allocated to treatment or control groups, with randomization designed to ensure that the study groups are comparable with respect to all variables except the intervention being studied. In observational studies, selection of patients is often study specific and comparison groups are not always readily available. Therefore, the choice of exposed and comparison patients can, at least in part, affect the results of observational studies. For example, in 2 separate studies using the Swedish anti–tumor necrosis factor (anti-TNF) registry, with the second study including data from the first, the risks of lymphoma in patients treated with anti-TNF drugs were reported to range from 0.8 (95% confidence interval [95% CI] 0.4–1.4) in the study by Askling et al (6) to 4.9 (95% CI 0.9–26.2) in the study by Geborek et al (7) (Table 1). The investigators invoked random variation and increased precision of the later study as possible explanations for the discrepancy. However, differences in the TNF-exposed and control groups could have also played some role. In the studies, the rates of lymphoma in the 2 TNF-treated groups were very different (31 of 10,000 in the study by Geborek et al versus 9 of 10,000 in the study by Askling et al). Moreover, in the study by Geborek et al, the comparison group consisted of community RA patients and the rate of lymphoma in that group was reported to be 5 of 10,000. In the study by Askling et al, there were 2 comparison groups: patients with early RA and RA patients with a hospital discharge diagnosis of RA. In those 2 groups, the rate of lymphoma was reported to be 8 of 10,000 and 11 of 10,000, respectively. Therefore, it is possible that in the later study, the rate of lymphoma in the anti-TNF group fell over time as the concern for the risk of lymphoma was growing, and patients may have been screened more closely prior to treatment. In addition, the control groups were different, with the higher rate in the study by Askling et al, perhaps because those controls represented a group of more active and sicker RA patients. Therefore, the apparent inconsistencies may relate to a wandering baseline of risk both in the selected patients and in the comparison patients.

Table 1. Results of 2 studies using the Swedish anti-TNF registry designed to assess the risks of lymphoma in patients treated with anti-TNF drugs*
 Geborek et al, Southern Sweden (1997–2002) (7)Askling et al, Sweden (1999–2004) (6)
TNF-treated groupControls: community RA cohortTNF-treated groupControls: early RA cohortControls: inpatient RA cohort
  • *

    Anti-TNF = anti–tumor necrosis factor; RA = rheumatoid arthritis; 95% CI = 95% confidence interval.

No. of lymphomas52911319
Rate per 10,0003159811
Rate ratio (95% CI) 4.9 (0.9–26.2) 0.8 (0.4–1.4)1.1 (0.6–2.1)

Another striking example to illustrate how the choice of controls can influence study results comes from the British (8) and German (9) biologic registries (Table 2). Despite fairly similar absolute risks of serious infections in patients treated with anti-TNF drugs in those 2 groups, the relative risks in the British study suggested no increased risk of serious infections in patients treated with anti-TNF drugs, whereas the German study estimated a doubling in risk. This may have resulted from the fact that the estimates of risk in the respective controls were very different, with the British controls having a 2-fold increase in risk of serious infections compared with the German controls. This comparison again highlights how the selection of the comparison group can clearly impact the conclusions drawn from otherwise similar data sets.

Table 2. Results of 2 studies on the risk of infections associated with anti-TNF drugs*
 Rate of infection per 1,000 person-years (95% CI)Relative risk (95% CI)
  • *

    Anti-TNF = anti–tumor necrosis factor; 95% CI = 95% confidence interval.

Dixon et al (8)51.3 (44.7–58.5)55.2 (48.8–62.2)41.1 (31.4–53.5)0.97 (0.63–1.50)1.04 (0.68–1.61)
Listing et al (9)64 (45–91)61 (40–95)23 (13–39)2.82 (1.4–5.9)2.7 (1.3–5.9)

The wandering comparison of risk, insofar as it relates to the comparison group, is in fact a form of selection bias, with the comparison group being different and thus possibly inappropriate for the cases. However, identifying an appropriate comparison group may be one of the most difficult things to do in the setting of observational studies. Nonetheless, it would be wrong to believe that data from observational studies are not as good as those from RCTs. On the contrary, repeated observational studies of harm using various study patients and controls may provide incremental evidence of harm that may in fact be reconcilable and may in time lead to more complete insight into true risk. Moreover, alternative study designs that help minimize selection bias of this type exist. Nested case–control studies, where cases and controls are identified from a defined cohort, are an example of this.

Confounding by Disease Severity

In observational studies, patients selected to receive a given treatment are usually systematically different from those who are not. A particular harm found at an increased rate in the treated patients may be mistakenly attributed to the treatment, when in fact it may be a result of the underlying disease. Therefore, confounding by disease severity may occur when a drug is preferentially prescribed to a group of patients with a worse baseline prognosis. Feinstein referred to this as a susceptibility bias, whereby differences in baseline characteristics rather than treatment differences could account for differences in outcomes between groups (10). In RA, for example, patients selected to receive anti-TNF drugs likely have more active disease than those who are not given these drugs. However, there is growing evidence to suggest that the risk of lymphoma in RA is particularly associated with disease activity (11). Therefore, the increased lymphoma rates observed with anti-TNF therapy may reflect, at least in part, confounding by disease severity, whereby patients with the highest risk of lymphoma preferentially receive anti-TNF drugs.

Wolfe and Michaud reported on the risk of lymphoma associated with anti-TNF drugs (12). They reported 14 cases of lymphoma in 10,012 person-years of exposure to anti-TNF drugs compared with 8 cases in 12,147 person-years of exposure to methotrexate only. This results in a crude rate ratio of 1.7. However, they noted that patients treated with anti-TNF drugs had more severe disease than those treated in their database with methotrexate and those nonregistry-enrolled patients with respect to several baseline characteristics, including Health Assessment Questionnaire, pain, and global severity scores. They therefore considered that their results could reflect, at least in part, some element of confounding whereby patients with the highest risk of lymphoma were preferentially treated with anti-TNF therapy.

Several statistical approaches are available to reduce confounding by disease severity. For example, propensity modeling is a sophisticated method designed to produce “quasi-randomization” (13). It is defined as the conditional probability of a particular exposure versus another given the characteristics of the group, and can be used to balance the groups and reduce differences between the treatment groups. A propensity score is the probability of receiving a particular treatment, and 2 patients with the same propensity score have an equal estimated probability of exposure.

In the previously mentioned study by Listing et al (9), which investigated the risk of infections in patients treated with anti-TNF drugs in the German biologics register, the crude relative risks of serious infections in patients treated with etanercept and infliximab were 2.8 and 2.7, respectively, compared with the controls (Table 3). However, after adjusting using propensity score methods to make patients similar according to whether they were more or less likely to receive anti-TNF treatments, the relative risks fell to 2.2 and 2.1, respectively. Therefore, nearly one-third of the increase found in the unadjusted results could be attributable to differences in patient characteristics.

Table 3. Results of a study on the risk of infections in patients treated with anti-TNF drugs in the German biologics register (9)*
  • *

    Anti-TNF = anti–tumor necrosis factor; RR = relative risk; 95% CI = 95% confidence interval.

Unadjusted RR (95% CI)2.8 (1.4–5.9)2.7 (1.3–5.9)
RR adjusted using propensity scores (95% CI)2.2 (0.9–5.4)2.1 (0.8–5.5)

Despite having unique advantages, propensity score methods are not without limitations and, of particular importance, do not adjust for unmeasured confounding (14). Therefore, careful attention must still be paid to the study design and assessment of other potential confounders.

In time, disease registries may offer their own unique opportunity to address confounding by disease severity. In fact, as new biologic therapies with different targets and mechanisms of action become available, between-drug comparisons may provide better estimates of harms associated with individual drugs. Indeed, if similar patients are started on drugs of different classes (e.g., anti-TNF therapies, CD20, CTLA-4, or interleukin-6 antagonists) based on factors unassociated with their disease status (e.g., formulary availability, patient or physician preference), then harms associated with one therapy but not another may, in that setting, be attributable to the drug.

Channeling Bias

Channeling bias is a form of confounding that occurs when a drug is preferentially prescribed to patients with different baseline characteristics (15). For example, patients at high risk for a given complication are preferentially prescribed or switched to a certain treatment because an alternative treatment may be associated with that particular complication. Therefore, in this case, preferential prescribing because of comorbidity or contraindication to another drug may result in confounding.

In a large population-based study we performed examining the association between leflunomide and interstitial lung disease (ILD) using a large claims database, we found that in the overall analysis, leflunomide (rate ratio 1.9), but not methotrexate (rate ratio 1.4), was associated with ILD (16) (Table 4). However, in a stratified analysis, we showed that in patients without prior exposure to methotrexate and without a history of ILD, methotrexate (rate ratio 3.1), but not leflunomide (rate ratio 1.2), was associated with ILD, whereas in the subgroup of patients with either a prior exposure to methotrexate or a history of ILD, methotrexate was highly protective against ILD (rate ratio 0.4) and leflunomide was associated with a significant increase (rate ratio 2.6) in the risk of ILD. Therefore, we concluded that this stratified analysis provided strong evidence that patients with a history of ILD may have been preferentially prescribed leflunomide rather than methotrexate on the assumption that, in contrast to methotrexate, no lung toxicity was known to be associated with leflunomide. Indeed, in a further analysis, we found that patients with a history of ILD were almost twice as likely to have received leflunomide compared with methotrexate (adjusted odds ratio 1.9, 95% CI 1.5–2.3) as a first DMARD. Therefore, channeling bias must be considered in studies of harm, and proper analysis of the data is necessary to determine whether this bias may have influenced the results.

Table 4. Rate ratios of interstitial lung disease (ILD) associated with disease-modifying antirheumatic drug use in all subjects and in subjects stratified by previous methotrexate use and previous ILD (16)
 Adjusted rate ratio (95% confidence interval)
All subjects 
 Methotrexate1.4 (0.8–2.3)
 Leflunomide1.9 (1.1–3.6)
Subjects with no previous methotrexate use and no previous ILD 
 Methotrexate3.1 (1.5–6.4)
 Leflunomide1.2 (0.4–3.1)
Subjects with previous methotrexate use or previous ILD 
 Methotrexate0.4 (0.2–0.9)
 Leflunomide2.6 (1.2–5.6)

Depletion of Susceptible

Observational studies have provided conflicting results as to whether anti-TNF drugs are associated with serious infections, with some reporting no increase and others reporting a 2-fold or greater increase in risk. A number of reasons have been invoked to explain this discrepancy, including differences in control groups, definitions of serious infections, and duration of followup. However, Askling and Dixon plotted the risk of serious infections in RA patients treated with anti-TNF drugs against treatment duration and found a higher risk early on followed by a normalization of risk with increasing treatment durations (17). Among the possible explanations noted for this finding, the authors included the phenomenon of “depletion of susceptible,” whereby patients at risk of serious infections and started on anti-TNF drugs may have developed serious infections early after the start of treatment, may have stopped their drug, and may not have contributed to the analysis in later time periods. On the other hand, subjects who were unaffected early, and who thus possibly represent a lower-risk group, are followed over time. Therefore, the absence of increased risk of serious infections in patients treated with anti-TNF drugs could have been a reflection of the fact that mostly low-risk patients were followed over time. To the extent that time-dependent risk can result in the early attrition of those individuals most susceptible to the event and in the followup of low-risk individuals, the depletion of susceptible can be construed as a form of selection bias.

One way to attempt to correct for this is to look at the risk of the adverse event for the entire followup period, regardless of the date of drug discontinuation. Indeed, Dixon et al extended their previously mentioned study on the risks of infections in patients treated with anti-TNF drugs in the British Biologics Register (18). They examined the risks of infections identified during various periods of followup, whether identified “while receiving treatment,” during treatment plus 90 days after discontinuation of treatment, or during all of followup (Table 5). They found that the rate ratios increased as the periods of followup were extended. They concluded that these results were consistent with a depletion of susceptible effect, whereby in an analysis with followup limited to the period of exposure, those at greater risk of infections are excluded from the analysis early and those who continue to receive treatment are really a healthier group at an overall lower risk.

Table 5. Results of a study on the risks of infections in patients treated with anti–tumor necrosis factor drugs in the British Biologics Register (18)
Followup timeIncidence rate ratio (95% confidence interval)
While receiving treatment1.22 (0.88–1.69)
Duration of treatment plus 90-day lag window1.30 (0.93–1.78)
Ever received treatment1.35 (0.99–1.85)

Immortal Time Bias

A recent study investigated whether the use of antimalarials in patients with systemic lupus erythematosus (SLE) could be associated with cancer incidence (19). The authors used a cohort of 235 patients with SLE followed for up to 31 years, of which 13 patients developed cancer during followup. The comparison of time with cancer incidence was based on comparing the 156 patients who had “ever” received antimalarials during followup with the 79 who did not. The Cox proportional hazards model was used to estimate the adjusted hazard ratio of 0.15 (95% CI 0.02–0.99). This result implied that the rate of all cancers could be significantly reduced by 85% in SLE patients receiving antimalarials.

However, this analysis was subject to a bias created by looking at “ever” exposure to antimalarials during followup. Immortal time refers to a time period during cohort followup when, by design, subjects cannot die or have the outcome event under study (20, 21). Therefore, exposed patients are necessarily “immortal” (in this case, cancer free) during the time span between cohort entry and the first prescription for an antimalarial. On the other hand, the comparison patients who did not receive antimalarials had no such cancer-free period because they could have developed cancer anytime during followup. Therefore, the comparison of the time with cancer incidence between these 2 groups provided an advantage to exposed patients because they were guaranteed, by design, a cancer-free period. To the extent that periods of unexposure are misclassified, the immortal time bias is a form of information bias. This type of bias will result in lowering the rate ratio (i.e., closer to the null if the effect is harmful [>1] or away from the null if the effect is protective [<1]).

A time-dependent Cox proportional hazards model or similar approach to data analysis that classifies the person-time from cohort entry until the first prescription as unexposed and the subsequent person-time as exposed is a simple approach to avoid an immortal time bias. We replicated the abovementioned study in a population-based cohort of 23,810 RA patients identified from provincial health care databases between 1980 and 2003 (22). We identified all cancer cases occurring during followup and obtained information on the timing of antimalarial agents, as well as all relevant concomitant medications. The analysis was based on an approach that considered the time-dependent nature of the antimalarial prescriptions and classified the time prior to the first one correctly as unexposed. As a result, the adjusted rate ratio of cancer incidence with antimalarial use was 1.1 (95% CI 0.9–1.3). This is quite different from the very protective effect observed using the approach subject to immortal time bias.


There are innumerable types of selection, information, and confounding biases, including but by no means limited to referral bias, recall bias, observer bias, reporting bias, or protopathic bias (23). The biases mentioned in this study (Table 6) present a basis for approaching bias using selected studies from the literature on the safety of drugs used to treat RA. Despite the potential for bias, however, one of the major strengths of observational studies relates to the study of drug-related harms, where large sample sizes and long durations of followup are often necessary to identify, in particular, rare harms. In a world where there is growing skepticism about the value of published research and the belief that bias is one of the possible threats to the validity of ongoing scientific research (24), the credibility of future observational studies on the safety of new treatments for RA will depend on the willingness of researchers to actively investigate and attempt to minimize bias, or at the very least recognize it and explain how it could have impacted their results.

Table 6. Summary of biases in observational studies discussed
 DefinitionExamplesClues for identificationPossible solutions
Selection biasAbsence of comparability between groups comparedWandering comparison of riskAre the cases and comparisons similar in all important respects except for the drug of interest?Nested case–control studies
  Depletion of susceptibleIs the risk of the outcome time dependent?Follow cases and comparisons for similar periods New user designs (25)
Information biasAlso known as observation, classification, or measurement bias, this bias results from incorrect determination of exposure or outcome, or bothImmortal time biasIs information about exposure classified in the same way for cases and comparisons?Survival analysis with time-dependent exposures
ConfoundingAn observed association between an exposure and an outcome may be accounted by a third factor, when that factor is associated with both the exposure and outcome but is not in the causal pathway between the exposure and outcomeConfounding by disease severityCould the outcome attributed to the drug also be an outcome of more severe disease?Restriction, matching, stratification Multivariate regression techniques Propensity modeling
  Channeling biasCould the treatment have been preferentially prescribed to patients with special preexisting morbidity?Stratification


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Hudson had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Hudson, Suissa.

Acquisition of data. Hudson, Suissa.

Analysis and interpretation of data. Hudson, Suissa.