Since 1989, when the seminal editorial by Wilske and Healey on reversing the pyramid (the step-down bridge concept) was published, the concept of early and aggressive treatment of rheumatoid arthritis (RA) has become the leading paradigm (1). The theory underlying this approach is based on a better understanding of the devastating long-term consequences of RA in most patients, a more realistic and positive assessment of the potential toxicities associated with antirheumatic treatment, and the logical tenet that irreversible damage must be prevented before it occurs. The paradigm has enjoyed increasing backing through evidence from controlled trials and longitudinal studies. An associated concept is that of the window of opportunity, which suggests that similar to a primary cancer, early arthritis is less entrenched, has a smaller load of “disease cells,” and is more responsive to treatment. Aggressive treatment during this phase is more likely to succeed than is the same treatment applied later in the course of disease.
In order for these concepts to be sustainable, they must be proved. However, the gold standard for proof (i.e., the randomized, controlled trial) is difficult to apply. The true outcome in RA (e.g., severe disability, joint replacement) may not be revealed for a long time. In fact, one of the reasons for the move away from the traditional pyramid approach to treatment was the appearance in the 1980s of longitudinal studies that documented the dismal outcome of most patients with RA who were followed up for more than 10 years. A measurable proxy outcome, radiographic damage, has only modest correlation with true outcome. Therefore (and assuming that the most feasible way to alter the prognosis of RA is through effects on disease activity), to prove the existence of any window of opportunity, an intervention must be started early, must have sufficient contrast with conventional therapy, and must be maintained for a period of time that is unknown but probably is lengthy.
The best alternative to a controlled trial is the cohort study. Cohort studies can be continued for any period of time, but their less-controlled (or noncontrolled) nature makes them more sensitive to bias, especially confounding by indication, as discussed below. The best and most feasible approach to proving the window of opportunity concept is followup of cohorts that started out as treatment groups in a controlled trial. The initial advantages are obvious: the similarity in prognosis at baseline is generally guaranteed by the randomization process, cointerventions are usually avoided or handled by the protocol, and contamination (crossover of treatment strategy) is prevented. Importantly, controlled trial protocols usually are more explicit in terms of data quality control than are other, less-controlled studies. However, once the trial is concluded, the followup study is no different from any other cohort study: treatment becomes individualized according to the preference of the physician and patient, and any contrast between the groups is likely to decrease. For instance, when physicians are aware of the importance of prognostic indicators, they may actively target the most aggressive treatment toward patients who are most in need of it, creating a spurious relationship between poor outcome and aggressive treatment (or, if results of the trial show that the aggressively treated group fared best, decreasing this effect). This is called confounding by indication (2).
The analysis of longitudinal data is also a challenge. In trials lasting ≤1 year, we are satisfied with comparisons between change scores over the year. As the observation period becomes longer, the change from baseline becomes less relevant than the achievable mean disease activity over a certain time period and the changes between time periods. Repeated measurements of both disease activity and outcome markers such as radiographic damage call for statistical techniques that correct for correlation between these measurements within a patient (e.g., a technique called generalized estimating equations [GEE]).
In this issue of Arthritis and Rheumatism, Verstappen et al describe the results of a 5-year followup study that began as a randomized trial in the region of Utrecht, The Netherlands (3). The original trial compared 4 treatment strategies in almost 240 patients with early RA. After 12 months, the investigators concluded that early initiation of antirheumatic therapy with hydroxychloroquine, intramuscular gold, or methotrexate (MTX) led to better clinical results than did use of the pyramid strategy (i.e., beginning treatment with nonsteroidal antiinflammatory drugs [NSAIDs] only and administering disease-modifying antirheumatic drugs [DMARDs] only when unavoidable). Radiographic progression was limited and similar between treatment groups. Four years after the end of the trial, followup of 80% of the patients revealed that the clinical benefit in favor of patients treated with DMARDs had disappeared, and that radiographic damage in the groups was equal. At first sight, these results appear to vindicate the pyramid strategy of watchful waiting. However, the authors conclude that more prolonged and aggressive treatment is necessary to maintain improvement.
The results described by Verstappen et al differ markedly from the followup results of other recent trials in early RA. The Combinatietherapie Bij Reumatoide Arthritis (COBRA) trial compared sulfasalazine (SSZ) monotherapy with combination therapy including step-down high-dose prednisolone, SSZ, and MTX (4). At the end of that trial, a large benefit in terms of radiographic damage was observed in the COBRA group, but the clinical difference between groups mostly disappeared after prednisolone was stopped. After a mean followup of almost 5 years, the benefits of COBRA therapy in terms of radiographic damage had increased despite similar levels of disease activity in both treatment groups (5). Other trials with shorter followup periods also showed results that are more supportive of the window of opportunity concept. The Utrecht group itself performed another trial that almost replicated the first trial but included only 3 strategies: hydroxychloroquine, gold/D-penicillamine, and MTX/SSZ (6). In that trial, a dose-related response in terms of both disease activity and radiographic damage was observed during the 2 followup years. The recently presented followup results of the Enbrel ERA (early rheumatoid arthritis) trial (in which etanercept was compared with high-dose MTX) (7, 8) and those of the Fin-RACo (FINnish Rheumatoid Arthritis Combination therapy) trial (comparing step-down combination-DMARD therapy and single-DMARD therapy) (9, 10) showed advantages of aggressive therapy in terms of progression of radiographic damage at 2 years. Finally, results of a recent cohort study also provided support for use of aggressive initial therapy (11).
How are we to interpret this discrepancy? I think the explanation by Verstappen et al, that in their study treatment was not aggressive enough and was not maintained for a long enough period of time, goes a long way. In addition, the initial prognosis of the patients enrolled in the study may play a role. Finally, the chosen contrast (i.e., between the pyramid approach and early use of DMARDs) may, in retrospect, not have been ideal. I will discuss each of these explanations in more detail, with special attention to the COBRA experience, with which I am most familiar.
The first possible explanation involves the treatment schedule. The pyramid group was assigned to receive NSAIDs only, with DMARDs administered only when doing so was unavoidable. During the trial, 16 (28%) of 57 patients in the pyramid group started DMARD therapy. Also, use of systemic glucocorticoids was numerically higher in the pyramid group than in the early DMARD group (11% versus 9%), and significantly more patients in the pyramid group than in the early DMARD group received intraarticular injections (41% versus 19%). Patients in the other groups received standard dosages of gold and hydroxychloroquine and low to intermediate dosages of MTX (maximum 15 mg/week). Although these dosages were considered appropriate when the trial was initiated, none can be regarded as aggressive by current standards. Indeed, hydroxychloroquine is now regarded as mild antirheumatic therapy.
During the followup period, the prevalence of glucocorticoid use in the pyramid group remained higher (albeit not significantly) than that in the early DMARD group (30% versus 19%). Treatment switches to other DMARDs more frequently involved a change to more aggressive rather than milder alternatives, especially in the pyramid group. Of note, ∼75% of patients in the pyramid group were receiving DMARD treatment in the year after the trial ended. All of these factors strongly suggest that the initial contrast between the treatments was limited, decreased during the trial period, and was subsequently lost during followup. Because changes in therapy were clinically indicated (e.g., unacceptable level of disease activity, toxicity), confounding by indication is likely. In other words, there was already too little separation between the groups in the first year (as evidenced by a lack of difference in radiographic progression), with almost no separation during followup.
In contrast, all patients in the COBRA trial received active treatment from the start. However, the addition of MTX and especially step-down glucocorticoids to SSZ created a strong contrast to SSZ monotherapy. Such additions resulted in instantaneous and strong clinical responses in the COBRA group, which contrasted with slower and less marked responses in the monotherapy group. Although most of the extra benefit of combination therapy in terms of disease activity observed at 6 months was lost at the end of the 1-year trial period (prednisolone and MTX were tapered and stopped after 28 and 40 weeks, respectively), in both treatment groups the changes in most disease activity parameters were greater and the end result at 1 year was better than what was observed in the study by Verstappen et al. Thus, clinical separation was much greater during the COBRA trial, resulting in a large difference in disease progression by the end of the first year.
In all trials that documented such differences, it is evident that following conclusion of the trial period, treatment groups converged in terms of disease activity and physical disability, although the absolute levels in various trials differed. This suggests that the rheumatologist and the patient agree to use a treatment regimen aimed at achieving and maintaining a certain low level of disease activity and disability. The fact that this level was higher in the Utrecht followup than in the COBRA study (the only directly comparable measure was the Health Assessment Questionnaire score, which was a mean of 1.2 in the Utrecht study and 0.7 in the COBRA study) probably reflects a change in opinion over the years regarding what constitutes low disease activity.
Another possible explanation for the discrepancy between studies is the initial prognosis. At the group level, the most important factors predictive of damage are initial disease activity and physical disability, and the presence at presentation of radiographic damage, rheumatoid factor (RF), and the HLA–DR4 shared epitope. Nevertheless, the appearance and rapidity of progression are highly individualized, with some patients experiencing rapid progression despite a good prognosis, and some showing little progression despite the presence of several adverse prognostic factors. Together with variability between raters, this creates “noise” that may make it difficult to detect a true difference in progression between treatment groups. Therefore, for optimal detection of differences in progression, it is best to perform the trial in patients who are likely to progress quickly. In the Utrecht trial, slightly more than half of the patients were RF positive (55% in the pyramid group, 66% in the DMARD group), disease activity was moderate, and the median score for radiographic damage (Sharp score) was 2. In contrast, in the COBRA trial 75% of patients were RF positive, disease activity was higher for all measures, and the initial median Sharp score was 4. Thus, patients in the COBRA trial, as a group, were more likely to show rapid progression, thus creating more “room” in which to demonstrate slowing of progression during aggressive treatment. In the Utrecht trial, there was some difference in progression between the groups after 1 year, but the difference was too small to be significant.
If we look at the ERA and the Fin-RACo trials for evidence of a window of opportunity, it appears as if treatment contrast and initial prognosis are somewhat interchangeable. Although the ERA patients resembled the COBRA patients in terms of having an unfavorable prognosis, the contrast between treatments was smaller (etanercept versus high-dose MTX), resulting in less initial separation according to progression of damage. The Finnish patients resembled the Utrecht patients, in that they all had a more moderate prognosis compared with patients in the COBRA study. However, the contrast between treatments was greater, resulting in better separation according to disease progression.
The final factor that is helpful in interpreting the Utrecht data concerns the analysis strategy. Although, as explained above, longitudinal studies benefit from more sophisticated statistical analysis techniques such as GEE, it is unlikely that their use would have yielded different conclusions. However, use of such techniques would have allowed for better control for longitudinal explanatory variables such as disease activity. In the COBRA followup study, we observed that disease activity during a 6-month period was strongly predictive of radiographic progression at the end of that period (12); this finding was confirmed in another cohort study (13). The main problem associated with the Utrecht followup study is that, as in the original trial, the authors chose to contrast the pyramid approach with early DMARD treatment. In retrospect, an alternative split could have been made, with pyramid therapy and hydroxychloroquine on one side, and MTX and gold on the other. This might have created a better contrast, as the results from their subsequent 3-group trial confirm. The authors only briefly touch on this possibility, when they tell us that median area under the curve for the erythrocyte sedimentation rate and morning stiffness was higher for patients starting pyramid or hydroxychloroquine therapy than for those starting MTX or gold.
In summary, the Utrecht trial and its followup study are a remarkable achievement in terms of data quality and completeness of followup. The lack of contrast between the pyramid and early DMARD strategies does not disprove the concept of a window of opportunity but can be explained by the treatments chosen, the initial prognosis of the patients, and the analysis strategy. Evidence for the benefits of early therapy that is more aggressive than that used in this trial remains strong.