Continuous indices of core data set measures in rheumatoid arthritis clinical trials: Lower responses to placebo than seen with categorical responses with the American College of Rheumatology 20% criteria
To describe indices that are continuous counterparts of categorical responses to the American College of Rheumatology 20% improvement criteria (ACR20), ACR50, and ACR70, which extend rheumatoid arthritis (RA) clinical trial results and recognize clinical worsening (as well as improvement) with active and placebo treatments.
Data from a clinical trial of leflunomide, methotrexate, and placebo treatment over 1 year were reanalyzed. Percent change was computed for each of the 7 components of the ACR core set of outcome measures. Four continuous indices were computed: 1) ACR-N (lowest of 3 values: number of swollen joints, number of tender joints, and median of the other 5 measures); 2) composite (median of all 7 measures [3 patient and 3 assessor measures plus erythrocyte sedimentation rate]); 3) patient-only (median of physical function, pain, and global status); and 4) assessor-only (median of number of swollen joints, number of tender joints, and global status). Means, medians, categorical 20%, 50%, and 70% responses, and continuous probability plots were computed according to each index for the 3 treatment groups and were compared with one another and with standard ACR20, ACR50, and ACR70 responses.
Mean levels of improvement calculated using the different methods, in patients taking leflunomide, placebo, and methotrexate, respectively, were as follows: ACR-N 20%, −12%, and 13%; composite 43%, 9%, and 33%; patient-only 36%, 0%, and 26%; assessor-only 50%, 20%, and 44%; and ACR20 52%, 26%, and 46%. Differences between leflunomide and placebo were 30–36%, and differences between methotrexate and placebo were 24–26%.
Continuous indices may be an informative addition to categorical ACR 20%, 50%, or 70% responses to compare efficacies of various treatments in RA, and to describe lower responses to placebo by recognizing worsening as well as improvement.
The American College of Rheumatology (ACR) core data set of outcome measures (hereinafter referred to as “core data set”) has been an important advance in rheumatoid arthritis (RA) clinical trials (1, 2). Improvement of at least 20% in both tender and swollen joint counts, as well as in at least 3 of the 5 additional measures (i.e., ACR20), is designated as the ACR preliminary definition of improvement (3). Higher thresholds for improvement, such as ACR50 and ACR70, have also been described (4). The ACR measures and improvement criteria are now used in most clinical trials in the US and many in Europe.
The assessment of new therapies has been substantially improved through the ACR20, ACR50, and ACR70. The convenience of reporting an underlying continuous index in several categories has proven satisfactory for recognizing significant differences in efficacy of active versus placebo treatments in recent clinical trials. Nonetheless, placebo treatment results in apparent improvement in 15–30% of patients with ACR20 responses in most RA clinical trials (4, 5). Therefore, continuous indices that include possible deterioration may be an informative addition to the basis for comparing active versus placebo treatment with categorical ACR20, ACR50, and ACR70 scales. One continuous index, termed the ACR-N (6), which reports the lowest of 3 values (percent change in the number of swollen joints, percent change in the number of tender joints, and median of the percent change in the other 5 core data set measures), has been used to advantage in several clinical trials (7).
In this study, we reanalyzed the results of a 1-year clinical trial of leflunomide, methotrexate, and placebo treatment to compute the ACR-N and 3 other continuous indices of 3 or 7 of the core data set measures, termed the “composite,” “patient-only,” and “assessor-only” indices. We evaluated results obtained using these indices relative to one another, and clarified how they are continuous counterparts to ACR20, ACR50, or ACR70 responses for recognition of differences between active and placebo treatments.
PATIENTS AND METHODS
Data from a randomized controlled clinical trial to compare leflunomide, methotrexate, and placebo over 12 months (the US301 trial) (8) were reanalyzed. Information on all ACR core data set measures (swollen joints, tender joints, assessor estimate of global status, physical function, pain, patient estimate of global status, and erythrocyte sedimentation rate [ESR]) were available for nearly all patients. Physical function was assessed according to the modified Health Assessment Questionnaire (9). For inclusion in the analyses, patients were required to have sufficient post-baseline data to determine ACR20, ACR50, or ACR70 response; this specification led to exclusion of 4 of 480 randomized patients.
The percent change was computed for each of the 7 core data set measures. Measures scored as 0 at baseline were managed as missing values when they did not change from 0, and as 0% change when they were increased (i.e., worsened). Percent changes more negative than −100% were recoded to −100% so as to avoid extreme outliers of worsening relative to a low baseline. Such recoding was performed for <5% of patients, other than 10% for the ESR; the recoding generally involved clinically unimportant changes, e.g., a “worsening” of a pain score from 0.2 to 0.6 on a scale of 0–10, indicating 200% worsening, was recoded to −100%. No recoding was needed if a baseline value was 50% of the maximum or more. Otherwise, missing data during the 12-month followup were managed with the last observation carried forward method, and percent change in C-reactive protein level replaced missing ESR values.
Four continuous indices were computed: 1) the ACR-N (lowest of 3 values, i.e., percent change in swollen joints, percent change in tender joints, and median of the percent change in the other 5 measures in the core data set [or the second largest percent change of 2, 3, or 4 measures when 1 or more was missing], which essentially assigns greater priority to swollen and tender joints, as in the ACR20); 2) composite (median percent change in all 7 ACR core data set measures if all were available, or fourth largest among 4–6, for which missing values are regarded conservatively as not better than any observed measure); 3) patient-only (median [or second largest if 1 measure was missing] percent change among the 3 patient-derived measures, i.e., physical function, pain, and global status); and 4) assessor-only (median [or second largest] percent change among the 3 assessor-derived measures, i.e., number of swollen joints, number of tender joints, and assessor estimate of global status).
The distributions of the 4 continuous measures, with the corresponding means and medians, were described for each of the 3 treatment groups. Pairwise comparisons between the means were made using analysis of variance t-tests. Probability plots are displayed graphically for the 3 treatment groups, highlighting the percentages of patients at the 0%, 20%, 50%, and 70% improvement levels, for clarification of the relationship of the continuous indices to their ACR20, ACR50, and ACR70 counterparts. These probability plots are similar in concept to Kaplan-Meier curves, in that they indicate the cumulative percentage of patients with outcomes at least as good as a designated value on the horizontal axis.
Mean and median scores, respectively, according to the ACR-N index were 20% and 20% for leflunomide, 13% and 10% for methotrexate, and −12% and −10% for placebo (Table 1). The median values indicate that at least half of the patients had improvement of 20% with leflunomide, compared with 10% with methotrexate and −10% with placebo, in the number of swollen joints, number of tender joints, and 3 of 5 other core data set measures.
Table 1. Mean (median) percent change in 4 continuous indices, versus percent of patients showing improvement by the ACR20, among rheumatoid arthritis patients treated with leflunomide, methotrexate, or placebo*
Difference in score for index treatment vs. placebo, mean ± SEM %†
Leflunomide (n = 178)
Placebo (n = 118)
Methotrexate (n = 180)
Leflunomide vs. placebo
Methotrexate vs. placebo
Patient data were derived from ref. 8. ACR20 = American College of Rheumatology 20% response criteria. See text for descriptions of the continuous indices.
P < 0.001 for all comparisons of leflunomide versus placebo or methotrexate versus placebo (by analysis of variance for continuous indices; by Fisher's exact test for ACR20).
31.8 ± 5.6
24.5 ± 5.6
ACR core data set composite median (or fourth largest)
34.1 ± 4.9
24.8 ± 4.9
Patient-only median (or second largest)
35.7 ± 5.7
25.9 ± 5.7
Assessor-only median (or second largest)
29.7 ± 4.9
23.9 ± 4.9
26.0 ± 5.5
19.3 ± 5.5
Respective mean and median scores for the composite of all 7 ACR core data set measures (or fourth highest among 4–6 if 1 or more measures were missing) were 43% and 50% for leflunomide, 33% and 33% for methotrexate, and 9% and 3% for placebo (Table 1). The ACR composite index indicates a higher level of favorable responses than the ACR-N index, since it equitably involves 4 of 7 ACR core data set measures, whereas the ACR-N involves 5 of 7, with a requirement that tender and swollen joint counts show improvement.
Mean and median scores according to a patient-only continuous index of 3 self-report measures (Table 1) were 36% and 43% in patients treated with leflunomide, 26% and 27% in patients treated with methotrexate, and 0.4% and 2% in patients treated with placebo. The median values indicate that at least half of the patients treated with leflunomide had at least 43% improvement in at least 2 of the patient-derived measures, compared with 27% of those treated with methotrexate and 2% receiving placebo.
Similar trends were seen for the assessor-only index (Table 1), with mean and median scores of 50% and 61%, respectively, in patients treated with leflunomide, 44% and 51% in patients treated with methotrexate, and 20% and 11% in patients treated with placebo. The higher scores in all groups on the assessor-only index reflect the fact that tender and swollen joint counts tended to indicate more improvement than patient self-report measures in patients who received either active or placebo treatment in this clinical trial (10). These scores on the patient and assessor indices were higher than those on the ACR-N index, which is defined more stringently to require specific improvement in swollen and tender joint counts (and 3 of the 5 other ACR core data set measures), as in the ACR20.
Differences in mean continuous index scores between the leflunomide and the placebo groups (Table 1) were in a similar range, from 36% for the patient-only index to 30% for the assessor-only index. Differences in mean continuous index scores between the methotrexate and placebo groups were also similar, with a range of 24–26%. All differences between leflunomide and placebo and between methotrexate and placebo were statistically significant (P < 0.001). The greatest differences were seen with the patient-only index.
Continuous index results were also analyzed for 20%, 50%, and 70% response levels for each index, and compared with ACR20, ACR50, and ACR70 results (Table 2). By construct, results for 20%, 50%, and 70% improvement according to the ACR-N index are essentially identical to those from the ACR20, ACR50, and ACR70, since both require improvement of tender and swollen joint counts and a majority of all other measures (except for nearly negligible variations in conventions for managing missing data in this study relative to the original conventions for the US301 clinical trial ). The ACR core data set composite index had higher proportions of patients with 20%, 50%, and 70% responses than the ACR-N or ACR20, since it does not prioritize joint count measures as requirements. The proportions of patients meeting 20%, 50%, and 70% improvement criteria were highest with assessor-only indices, reflecting higher improvement with active and placebo treatments, and were also higher with the patient-only index than with the ACR20, ACR50, and ACR70.
Table 2. Percent of patients in the leflunomide (LEF), placebo, and methotrexate (MTX) groups who improved at the 20%, 50%, and 70% levels according to 4 continuous indices and the American College of Rheumatology (ACR) core data set*
Probability plot displays of the distributions of the respective continuous indices (Figure 1) indicated that negative results were seen with placebo in ∼60% of patients according to the ACR-N (∼40% had a positive response), ∼50% with the composite or patient self-report index, and ∼40% with the assessor-derived index (∼60% had a positive response). These curves highlight the 0%, 20%, 50%, and 70% levels, but allow recognition of any level of response. For example, with the patient-only index, 50% responses were seen in 45% of patients treated with leflunomide, 36% treated with methotrexate, and 13% treated with placebo.
The continuous indices presented in this report may be an informative addition to categorical ACR20, ACR50, and ACR70 responses to assess differences from baseline to end point in RA clinical trials, by allowing estimation of improvement (or worsening) at any level over a range of −100% to +100%, rather than at arbitrary 20%, 50%, and 70% levels. The indices document median differences between active and placebo treatment ranging from 24% to 36%, with sensitivity similar to that of the ACR20. Furthermore, because these indices take into account possible worsening in any single measure, lower median placebo responses are seen, which may present an advantage in RA clinical trials, most of which show 15–30% of placebo-treated patients as having response according to the ACR20 (4, 5). Although responses to placebo may differ significantly from responses to active treatment in groups of patients, a possible inference that 25–30% of individual patients treated with placebo may achieve a satisfactory response may be overstated.
In chronic diseases that can be characterized with one quantitative measure such as blood pressure or serum cholesterol level, results incorporate worsening as well as improvement. Therefore, if placebo treatment results in half the patients showing 25% improvement and half showing 25% worsening, results are reported as “no change.” In RA, no single measure can serve as a gold standard for all individual patients, and indices are used to assess clinical status. An RA clinical trial which indicates that placebo treatment results in half the patients showing 25% improvement in core data set measures (including swollen and tender joint counts) and half showing 25% worsening would be reported as “50% of patients met ACR20 response criteria,” rather than a net mean improvement of 0. This may overstate the efficacy of placebo for individual patients in computation of group results, since 26% of placebo-treated patients had ACR20 responses. In contrast, in the present analyses, the change in status in 50% of the placebo-treated patients was −10% with the ACR-N index, 3% with the composite index, 2% with the patient-only index, and 11% with the assessor-only index.
Improvement at the 20% level (by the ACR20) in 15–30% of placebo-treated patients in RA clinical trials may be explained in part by 3 additional reasons beyond not “penalizing” a therapy when a patient has poorer status at the end of a clinical trial compared with baseline: 1) most clinical interventions result in some placebo effect (11); 2) almost all patients classified as receiving placebo in recent clinical trials actually received some therapies, including nonsteroidal antiinflammatory drugs, glucocorticoids, and/or even methotrexate (although required in the latter case to have incomplete responses to methotrexate), albeit balanced by similar therapies in the active treatment group; and 3) swollen and tender joint counts, for which improvement must occur in order for improvement to be recorded according to the ACR20, ACR50, and ACR70, appear more likely than other measures in the core data set to improve with placebo treatment (10, 12–14). Nonetheless, placebo response levels in the US301 clinical trial reanalyzed in the studies reported herein suggest that a major contributor to the apparent benefit of placebo according to ACR20 is the scoring of patients with poorer status as “<20%” (or <50% or <70%), rather than a negative number to offset positive results.
This report documents that results with the patient-only continuous index are similar to ACR20, ACR50, and ACR70 results, extending our previous report concerning the value of a patient-only index (13). The patient-only measures are easily ascertained in standard clinical care (15) and have been documented to provide the most effective measures to predict most important long-term outcomes of RA, including mortality, costs, and work disability (16). Most rheumatologists do not perform quantitative joint counts in most patients at most visits (17), and therefore would likely have only laboratory data on ESR or C-reactive protein to provide possible quantitative documentation of clinical improvement.
The data presented here also may suggest consideration of whether prioritizing swollen and tender joints may detract from, rather than add to, the capacity to distinguish between active and placebo treatment in an RA clinical trial. Another consideration may be inclusion of calculations for a type of patient-only index in reporting trial results, so that rheumatologists may compare data from clinical trials with easily obtained data from patient questionnaires in actual clinical care. Further analyses of other clinical trials are needed to illuminate these matters. Nonetheless, these studies, as well as our previous report (13), indicate that patient questionnaire data alone are comparable with joint count and laboratory data to distinguish changes in clinical status with active treatment versus placebo treatment in patients with RA.