Editorial: Methotrexate saves lives: A pearl of observational research


  • Robert B. M. Landewé

    Corresponding author
    1. Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands, and Atrium Medical Center, Heerlen, The Netherlands
    • Department of Clinical Immunology and Rheumatology, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105AZ Amsterdam, The Netherlands
    Search for more papers by this author

Randomized clinical trials (RCTs) are the Holy Grail of evidence-based medicine. They are a conditio sine qua non in the process of drug approval. Well-designed RCTs are beautiful research experiments that are supposed to yield “unbiased results.” The essence of an RCT is the process of randomization that—with a high likelihood—creates similar treatment groups at baseline if the sample is sufficiently large (prognostic similarity). The assumption is that every between-group difference that arises after randomization can be ascribed to the treatment under investigation, and not to something else. In order to preserve prognostic similarity during the trial, designers take formidable protective measures. Short trial duration, concealed treatment allocation, and blinded outcome assessment are examples of measures that contribute to the creation of an environment in which a particular treatment effect can be demonstrated with least likelihood of bias (high level of internal validity). Including patients with relatively active and severe disease and excluding those with comorbidities further helps to maximize the chance of a successful trial. Unfortunately, the results of such trials are more difficult to translate into clinical practice, because the usual patient in clinical practice simply does not resemble the average patient in the trial (lack of external validity). An RCT provides only limited clinically relevant information.

In addition, many clinically relevant study questions cannot appropriately be addressed in an RCT. Suppose you want to investigate whether methotrexate (MTX) reduces mortality in patients with rheumatoid arthritis (RA). In theory, an RCT could be designed to test this hypothesis, but you will immediately face unsolvable problems. What treatment should be given in the control arm of such a trial? MTX is “standard treatment,” and it is ethically unjustifiable to withhold MTX from patients who need it in the opinion of the rheumatologist. Mortality is a relatively rare “outcome,” so you need many patients and a long followup period for reasons of statistical power. However, can you assume that patients in such a trial will maintain exactly the same treatment over many years and that they will avoid cointerventions or health behavior that may also be relevant to mortality and that may spuriously influence your trial results? And have these results any practical meaning if they stem from a selection of RA patients who meet your inclusion and exclusion criteria and are willing to take part in such an artificial experiment?

In scenarios as sketched here, with clinically relevant but complicated study questions, RCTs are impractical and infeasible. The only type of study that may help to give an answer is the long-term observational study.

Observational studies

Observational studies have “bad press,” especially with regard to the interpretation of drug effects. This is understandable to some extent since observational studies differ importantly from RCTs in that a treatment choice is not determined by chance (as after randomization) but rather by the consensual decision of the physician and the patient. In other words, it is the perceived activity, severity, and prognosis of the disease rather than randomization that determines whether a particular patient will be treated with a particular drug. In our example of MTX use reducing mortality, this could imply that the patients with more severe and active disease, who may also intrinsically have a higher likelihood of mortality, will be preferentially treated with MTX (and not with other disease-modifying antirheumatic drugs [DMARDs]). Alternatively, it could be that the drug MTX is reserved for those patients who are younger and have fewer comorbidities, and as such may have a lower likelihood of mortality. Regardless of how these counteracting effects exactly work out, an analysis of the effect of MTX use on mortality can be influenced by prognostic differences between MTX users and MTX nonusers. This important form of bias has been named confounding by indication.

Over the last 10 years, methodologic research has focused on appropriate adjustments for the spurious effects of confounding by indication, as well as other biases associated with observational research. In this issue of Arthritis & Rheumatism, Wasko et al have provided a beautiful example of how long-term observational cohorts should be analyzed in order to obtain results that may have credibility similar to the results of RCTs (Wasko MC, Dasgupta A, Hubert H, Fries JF, Ward MM. Propensity-adjusted association of methotrexate with overall survival in rheumatoid arthritis. Arthritis Rheum 2013;65:334–42).

Wasko and colleagues investigated the effects of MTX on mortality in RA in the Arthritis, Rheumatism, and Aging Medical Information System study cohort, a well-known, well-documented, and very complete cohort of >5,000 patients followed up for many years. The authors are very well aware of the potential fallacies associated with observational research (among which confounding by indication is an important one), so they have provided a series of analyses. These analyses should be interpreted in close conjunction (which admittedly requires some methodologic expertise) in order to fully appreciate the importance of the results. The applied methodology can be roughly subdivided into 2 parts, which I will briefly discuss here: 1) analyses to adjust for confounding by indication (propensity adjustment) and 2) sensitivity analyses.

Propensity adjustment

In analyses using propensity adjustment, it is assumed that the likelihood that a particular patient will be treated with a particular drug can be estimated (“modeled”) using all kinds of known variables in the cohort, both disease related (e.g., disease activity measures, rheumatoid factor positivity) and non–disease related (e.g., comorbidities, patient's and physician's preferences). This likelihood is expressed as a value between 0 (lowest) and 1 (highest). Let us imagine a scenario of patients with early, untreated RA in which the rheumatologist has a choice between starting treatment with hydroxychloroquine and starting treatment with MTX. In such a scenario, MTX can be considered an effective and appropriate first-choice DMARD for patients with rather active and severe RA, and as such the propensity score of an individual patient to start treatment with MTX can be considered a combined proxy for disease activity, perceived prognosis, and preference for and appropriateness of MTX treatment (in comparison with hydroxychloroquine treatment). A propensity score close to 1 may in fact tell you that the patient not only has relatively active disease (e.g., a high pain score, high joint counts) and a relatively unfavorable prognosis (e.g., positivity for anti–cyclic citrullinated peptide antibodies), but also no important contraindications to MTX use (e.g., absence of hepatic and pulmonary disease). Reasoning along similar lines, however, a propensity score close to 1 may have an entirely different, and partly opposite, connotation if the comparative choice is not hydroxychloroquine but a tumor necrosis factor α–inhibiting biologic agent. The interpretation of a propensity score is entirely context dependent.

Wasko et al in fact investigated the effect of MTX use in comparison with the use of any other DMARD on mortality after adjustment for the propensity to be treated with MTX. They tried to disentangle “the drug MTX” and “the MTX-using patient” with respect to their effects on mortality. Their demonstration that the protective effect of MTX remained intact after the adjustment for propensity scores serves to tell us that it is “the drug MTX” rather than “the MTX-using patient” that may explain the reduction in mortality. In other words, it tells us that it is less likely that confounding by indication causes the observed reduction in mortality. Confounding by indication can never be completely excluded, simply because there will always be other unmeasured and/or intangible factors that are responsible for residual confounding.

In my opinion, the authors have made a fairly strong argument. However, the applied methodology is not entirely beyond discussion.

Wasko and colleagues performed a statistical modeling technique (called “random forests”) with which they could almost perfectly predict the actual use of MTX in their cohort. Looking at the pleiotropic collection of variables that they have taken into consideration, and at the heterogeneity with respect to the contribution of these variables to the overall model, it becomes obvious that not only factors associated with a higher level of severity/activity of the disease, but also many non–severity-related variables, are predictive of MTX use. Some of the variables may be associated with a higher risk of mortality, others with a lower risk. The ultimate propensity estimation is the resultant of all potentially counteracting variables.

The epidemiologic concept of confounding (by indication) requires that the potential confounder (here, the indication to start MTX) has to be associated with both the exposure variable (here, MTX use) and the outcome variable (here, mortality). Wasko et al do not provide data about the association between the propensity estimation and mortality. Since in Wasko and colleagues' study predicted MTX use was in fact almost synonymous with observed MTX use, and they have found a negative (protective) association between MTX use and mortality, this would imply that the propensity score is also negatively (rather than positively) associated with mortality.

In my opinion the almost perfect prediction of MTX use jeopardizes the rationale of propensity adjustment for suspected confounding by indication. Often, in such circumstances, there are 1 or 2 dominant variables that largely determine the accuracy of the model, and the authors provide supplemental data showing that it is the “phase of the study,” rather than variables associated with the activity and severity of RA, that determines the propensity for MTX use. In other words, the likelihood of receiving a prescription for MTX was inherently low for any patient in the early 1990s and inherently high during the later phases of the study. Such a propensity score will not appropriately adjust for confounding by indication, but appropriate information to judge this is unfortunately lacking. In conclusion, the authors may have focused on successfully modeling the indication of MTX rather than on operationalizing the indication of MTX as a proxy for disease severity/activity.

Sensitivity analyses

Wasko et al have acknowledged very well that findings stemming from observational studies are sensitive to various kinds of biases that cannot be controlled for and that may have an effect on the results. The analysis plan of RCTs is usually straightforward, aiming at exploring the primary outcome, rigidly written down in statistical analysis plans, and designed up front. This fits perfectly well with the recognition that in RCTs everything should be kept under control as much as possible.

In observational research there is usually and necessarily far more flexibility in the approach to analyzing the data. This required level of flexibility is based on the recognition that results stemming from observational studies can always be contested regarding their internal validity. In other words, one can always bring up an alternative factor for the investigated exposure that may also help to explain a demonstrated effect. Appreciating this inherent shortcoming of observational research, investigators increasingly apply sensitivity analyses. The goal of a sensitivity analysis is to “challenge” the main observation. In statistical terms, a sensitivity analysis investigates whether the association of interest (here, MTX use with mortality) is sensitive to interference by other possibly explanatory factors that have (in case of confounding) or do not have a relationship with the exposure variable. Wasko et al performed quite a number of these sensitivity analyses, and all this work, for which they should be commended, has contributed tremendously to the credibility and interpretability of the results.

By carefully applying sophisticated modern data modeling techniques, and by exploring both the wealth and the potential dangers of observational databases, Wasko et al have contributed importantly to bringing observational research in rheumatology to a higher level. Admittedly, the report of an RCT is easier to read than Wasko and colleagues' sophisticated analysis of an observational database, but the latter is not therefore of less importance or scientific quality.

Does MTX use save lives?

What remains is the question of whether MTX use is truly associated with less mortality in RA. Almost all independently obtained information from observational studies points in the same direction. Wasko and colleagues' work provides convincing evidence that this protective association remains intact after adjustment for a variety of possibly interfering or confounding factors. Given the prominent and established place of MTX treatment in RA, and the feasibility issues surrounding RCTs, a formal RCT to investigate the effects of MTX on mortality will not, and should not, be performed in the future. It would therefore be better to trust that MTX indeed saves lives, and to treat our RA patients accordingly, in recognition that we now have additional evidence in support of prescribing MTX as long as possible for our patients with RA.


Dr. Landewé drafted the article, revised it critically for important intellectual content, and approved the final version to be published.