Drs. Shahouri and Michaud contributed equally to this work.
Remission of rheumatoid arthritis in clinical practice: Application of the American College of Rheumatology/European League Against Rheumatism 2011 remission criteria
Version of Record online: 28 OCT 2011
Copyright © 2011 by the American College of Rheumatology
Arthritis & Rheumatism
Volume 63, Issue 11, pages 3204–3215, November 2011
How to Cite
Shahouri, S. H., Michaud, K., Mikuls, T. R., Caplan, L., Shaver, T. S., Anderson, J. D., Weidensaul, D. N., Busch, R. E., Wang, S. and Wolfe, F. (2011), Remission of rheumatoid arthritis in clinical practice: Application of the American College of Rheumatology/European League Against Rheumatism 2011 remission criteria. Arthritis & Rheumatism, 63: 3204–3215. doi: 10.1002/art.30524
- Issue online: 28 OCT 2011
- Version of Record online: 28 OCT 2011
- Accepted manuscript online: 7 JUL 2011 12:31PM EST
- Manuscript Accepted: 23 JUN 2011
- Manuscript Received: 17 JAN 2011
- VA Health Services Research & Development Service
- Arthritis Foundation New Investigator Award
- NIH (American Recovery & Reinvestment Act grant). Grant Number: 1RC1AR058601-01)
- VA Merit grant
- VA Career Development Award. Grant Number: (CDA 07-221)
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
To describe use of the American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) rheumatoid arthritis (RA) remission criteria in clinical practice.
Remission was examined using data on 1,341 patients with RA (91% men) from the US Department of Veterans Affairs RA (VARA) registry (total of 9,700 visits) and 1,153 patients with RA (25.8% men) in a community rheumatology practice (Arthritis and Rheumatology Clinics of Kansas [ARCK]) (total of 6,362 visits). Cross-sectional and cumulative probabilities were studied, and agreement between the various remission criteria was assessed. Aspects of reliability of the criteria were determined using Boolean-based definitions, as well as the Clinical Disease Activity Index (CDAI) and Simplified Disease Activity Index (SDAI) scoring methods proposed by the ACR/EULAR joint committee.
When the 3-variable ACR/EULAR definition of remission recommended for use in community practice (swollen and tender joint counts ≤1, and visual analog scale score for patient's global assessment of disease activity ≤1) was applied, cross-sectional remission was 7.5% (95% confidence interval [95% CI] 6.4, 8.7%) for ARCK and 8.9% (95% CI 7.9, 9.9%) for VARA, and cumulative remission (remission at any observation) was 18.0% (for ARCK) and 24.4% (for VARA), over a mean followup of ∼2.2 years. Addition of the erythrocyte sedimentation rate or C-reactive protein level to the criteria set reduced remission to 5.0–6.2%, and use of the CDAI/SDAI increased the proportions to 6.9–10.1%. Moreover, 1.8–4.6% of the patients met remission criteria at ≥2 visits. Agreement between criteria definitions was good, as assessed by kappa statistics and Jaccard coefficients. Among patients in remission, the probability of a remission lasting 2 years was 6.0–14.1%. Among all patients, the probability of a remission lasting 2 years was <3%. Remission status and examination results for each patient varied substantially among physicians, as determined by multilevel analyses.
Cross-sectional remission occurred in 5.0–10.1% of the patients in these cohorts, with cumulative remission being 2–3 times greater; however, long-term remission was rare. Problems with reliability and agreement limit the usefulness of these criteria in the individual patient. However, the criteria can be an effective method for measuring clinical status and treatment effect in groups of patients in the community.
Remission in rheumatoid arthritis (RA) was first described and quantified by Pemberton and Pierce in 1927 (1), followed by Thompson et al in the next decade (2). In 1981, Pinals et al presented the preliminary criteria for remission in RA (3). These criteria, which became the official American College of Rheumatology (ACR) criteria, were difficult to use and did not have a clear basis in scientific measurement. Over the years, a series of different, often ad hoc, criteria were proposed or used in published studies. These additional criteria have been described in detail in a number of important publications (4–6).
In 2011, the ACR and European League Against Rheumatism (EULAR) jointly presented the ACR/EULAR provisional definition of remission in RA for clinical trials (6). In their report, the authors also suggested “that a definition of remission be developed for clinic-based practice that would not require an acute-phase reactant, as long as it would capture remission as stringently as the measure employed for clinical trials,” and furthermore that “… core set measures should be used to define remission and that any definition of remission in clinical trials should look toward and make possible a similar definition in clinical practice” (6).
Remission in clinical practice is an important issue. For groups of patients assessed in observational studies, remission can be a marker of disease severity and treatment response. The ACR/EULAR recommendation for the use of remission criteria in clinical practice suggests, in addition, that the determination of remission be extended to the individual patient. If applied to the individual patient, remission, or the lack of it, could serve as a measure of treatment success that could be used by the patient, as well as third-party payers and regulatory authorities, to characterize the quality of health care, dictate access to care, or govern the use of specific therapies.
A recent large cross-sectional study of remission by Sokka et al included 5,848 patients with RA from 67 sites in 24 countries (4). The authors evaluated 8 different criteria for defining remission. Three of these criteria are of particular interest in the current study: the Clinical Disease Activity Index (CDAI) (7), the Disease Activity Score in 28 joints (DAS28) (8), and the Routine Assessment of Patient Index Data 3 (RAPID-3). Of these, only remission assessed by the CDAI is recognized in the newly revised ACR/EULAR criteria. In the previous study by Sokka et al (4), the proportion of patients classified as in remission by the CDAI criteria varied strikingly among the different countries, ranging from 0% to 35.3%, with a value of 18.5% in the US.
In the current study, we obtained data on all patients and all clinic visits from sites with multiple physicians, including a private practice rheumatology specialty group and 9 US Department of Veterans Affairs outpatient rheumatology clinics that included 38 physicians. We used multilevel, repeated-measures methods to examine the probability of remission at a given clinic visit, the cumulative probability of remission, the probability of a second remission, and the duration of remission among patients with RA in the 2 cohorts after each of the ACR/EULAR criteria definitions had been applied. In addition, we evaluated the degree of physician bias, the effect of the patient's global assessment of disease activity, and the agreement between the various criteria definitions.
PATIENTS AND METHODS
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
Patients and variables.
From October 5, 2006 to November 16, 2010, 1,435 patients with RA (total of 15,152 clinic visits) were seen at the Arthritis and Rheumatology Clinics of Kansas (ARCK), a 5-physician rheumatology specialty clinic (9). All patients underwent assessments that included the swollen joint count (SJC) (28 joints assessed), the tender joint count (TJC) (28 joints assessed), visual analog scale (VAS) scores for the physician's global (PhGlobal) and patient's global (PtGlobal) assessments of disease activity, and the disability index (DI) of the Health Assessment Questionnaire II (HAQ-II) (10). The erythrocyte sedimentation rate (ESR) was not systematically obtained at all visits, for clinical and insurance reasons. In this study, we evaluated the 1,153 patients (total of 6,362 visits) for whom complete data on the SJC, TJC, PtGlobal VAS score, and ESR were available. All patients were seen as part of routine medical care.
We also evaluated patients with RA who were part of the US Department of Veterans Affairs RA (VARA) registry, which contains data from a consortium of 9 sites (Dallas, Washington, DC, Omaha, Salt Lake City, Denver, Jackson, Portland, Brooklyn, and Iowa City) and 38 physicians, during the period December 11, 2002 through September 24, 2010 (11, 12). Data collected included the SJC, TJC, PtGlobal VAS score, PhGlobal VAS score, ESR, C-reactive protein (CRP) level, and the Multidimensional HAQ (MDHAQ) score (13). Among the 1,510 patients in the registry during this time (total of 11,915 visits), we studied 1,341 patients (total of 9,700 visits) by restricting our analyses to those with complete data on the SJC, TJC, PtGlobal VAS score, and ESR. All patients were seen as part of routine medical care.
Using the above-mentioned variables, we calculated indices of RA disease activity, including the DAS28 (8), the Patient Activity Scale II (PAS-II) (14), the RAPID-3 (15), the Simplified Disease Activity Index (SDAI) (7), and the CDAI (7). The PAS-II and RAPID-3 are essentially the same scale, except that the PAS-II uses the HAQ-II DI and RAPID-3 uses the MDHAQ. The results of both scales are equivalent. The scales for these measures are calculated as follows: (PtGlobal VAS score + VAS score for patient's assessment of pain + [3 × HAQ-II DI or MDHAQ score])/3. The SDAI is the sum of the TJC (on a 0–28 scale), SJC (on a 0–28 scale), PtGlobal VAS score (on a 0–10 scale), PhGlobal VAS score (on a 0–10 scale), and CRP level (in mg/dl). The CDAI is the sum of the TJC (on a 0–28 scale), SJC (on a 0–28 scale), PtGlobal VAS score (on a 0–10 scale), and PhGlobal VAS score (on a 0–10 scale).
From these scales, we created the following RA remission criteria, as also defined in the recent ACR/EULAR joint committee report (6): the 3-variable ACR/EULAR remission criteria (AE-3) = SJC ≤1 + TJC ≤1 + PtGlobal VAS score ≤1; the 3-variable ACR/EULAR remission criteria plus ESR (AE-3 plus ESR) = SJC ≤1 + TJC ≤1 + PtGlobal VAS score ≤1 + ESR <20 mm/hour (in men) or <30 mm/hour (in women); the 3-variable ACR/EULAR remission criteria plus CRP (AE-3 plus CRP) = SJC ≤1 + TJC ≤1 + PtGlobal VAS score ≤1 + CRP level ≤1 mg/dl; the 4-variable ACR/EULAR remission criteria (AE-4) = SJC ≤1 + TJC ≤1 + PtGlobal VAS score ≤1 + PhGlobal VAS score ≤1; the ACR/EULAR definition of remission based on the CDAI (AE-CDAI) = CDAI ≤2.8; and the ACR/EULAR definition of remission based on the SDAI (AE-SDAI) = SDAI ≤3.3. We also evaluated criteria that were not included in the ACR/EULAR recommendations. These criteria classified remission based on a DAS28 score of <2.6 (4) or a PAS-II or RAPID-3 score ≤1 (4).
Furthermore, for comparison with the ACR/EULAR remission criteria, we evaluated the Minimal Disease Activity (MDA) criteria for RA (16), in which MDA was considered present if the patient satisfied at least 5 of the following 7 conditions: VAS pain score ≤2 (on a scale of 0–10), SJC ≤1 (on a scale of 0–28), TJC ≤1 (on a scale of 0–28), HAQ score ≤0.5 (on a scale of 0–3), PtGlobal VAS score ≤2 (on a scale of 0–10), PhGlobal VAS score ≤1.5 (on a scale of 0–10), and ESR ≤20 mm/hour. Alternatively, the patient was required to have no swollen joints, no tender joints, and an ESR ≤10 mm/hour.
The ARCK and VARA cohorts were compared by randomly selecting an observation for each subject. We tested for differences between groups for individual variables by t-tests and chi-square tests, and for measures of swollen and tender joints (SJC and TJC, respectively) and other disease activity variables (PtGlobal VAS score, PhGlobal VAS score, HAQ-II DI score, and PAS-II score) simultaneously with multivariate means tests (using the Stata MVTEST procedure).
To determine the probability of remission, we used all observations from each patient who met the AE-3 plus ESR entry criteria. Clinical characteristics of the patients with complete data on the AE-3 plus ESR criteria are shown in Table 1. We determined the probability of remission at any given observation using the Stata population-averaged XTREG procedure, together with the Margins procedure. Separate evaluations were performed with the data stratified by cohort (ARCK versus VARA) and patients' sex (male versus female) (Table 2). We also determined cumulative probabilities of remission (remission at any observation time during followup), probabilities of a second remission, and the probabilities of remaining in remission for 3, 12, and 24 months. To determine the marginal probability of the occurrence of 1 or more remissions, we used the Stata random-effects XTREG procedure, and determined the intraclass correlations and marginal probabilities (17). Additional probabilities were calculated for remission classifications assessed using non-ACR/EULAR remission criteria (Table 2). We determined the durability of remission in the study cohorts using Kaplan-Meier life-table analyses (18) (see Figure 1).
|Variable||ARCK cohort||VARA cohort|
|No. of patients||1,153||1,341|
|Age, years||59.3 ± 13.8||65.1 ± 11.3|
|Sex, % male||25.8||90.9|
|Disease duration, years||10.0 ± 9.8||13.2 ± 11.5|
|Ever rheumatoid factor positive, %||79.0||84.4|
|HAQ-II DI/MDHAQ score (scale 0–3)||1.1 ± 0.7||1.0 ± 0.6|
|VAS pain score (scale 0–10)||4.8 ± 2.8||4.3 ± 2.9|
|Patient global VAS score (scale 0–10)||4.4 ± 2.7||4.0 ± 2.5|
|Physician global VAS score (scale 0–10)||3.6 ± 2.1||3.3 ± 2.3|
|Swollen joint count (total of 28 joints)||3.0 ± 3.0||3.2 ± 4.7|
|Tender joint count (total of 28 joints)||3.5 ± 5.2||4.1 ± 6.2|
|PAS-II/RAPID-3 score||4.2 ± 2.4||3.9 ± 2.1|
|All||22.4 ± 20.5||26.4 ± 23.0|
|Women||23.0 ± 19.5||30.2 ± 25.1|
|Men||19.9 ± 22.5||26.1 ± 22.8|
|CRP, mg/dl||1.24 ± 1.97|
|Current medication use, %|
|Cross-sectional probability of remission (95% CI)||Cumulative probability of remission (95% CI)†||Probability of second remission (95% CI) at any observation|
|ACR/EULAR criteria ARCK|
|AE-3 plus ESR||6.2 (5.2, 7.3)||5.7 (4.6, 6.8)||7.9 (5.5, 10.3)||14.8 (12.8, 16.9)||2.4 (1.4, 3.9)|
|AE-3||7.5 (6.4, 8.7)||7.1 (5.8, 8.3)||9.1 (6.5, 11.7)||18.0 (15.8, 20.2)||3.0 (1.9, 4.5)|
|AE-4||5.0 (4.1, 5.9)||4.5 (3.5, 5.4)||6.8 (4.6, 9.1)||13.0 (11.1, 15.0)||1.8 (0.9, 3.1)|
|AE-CDAI||6.9 (5.9, 8.0)||6.1 (5.0, 7.3)||9.8 (7.2, 12.4)||17.8 (15.6, 20.0)||2.2 (1.4, 3.5)|
|AE-3 plus ESR||5.0 (4.3, 5.8)||6.4 (3.9, 9.0)||4.9 (4.2, 5.7)||16.5 (14.6, 18.4)||1.5 (0.9, 2.4)|
|AE-3 plus CRP||7.0 (5.9, 8.0)||9.1 (5.3, 13.0)||6.8 (5.7, 7.8)||20.9 (18.5, 23.3)||2.8 (1.8, 4.1)|
|AE-3||8.9 (7.9, 9.9)||11.6 (7.9, 15.3)||8.7 (7.7, 9.8)||24.4 (22.2, 26.6)||3.3 (2.4, 4.4)|
|AE-4||7.2 (6.2, 8.2)||7.3 (4.0, 10.5)||7.2 (6.2, 8.3)||17.8 (15.8, 19.9)||2.8 (1.8, 4.2)|
|AE-CDAI||10.1 (8.9, 11.3)||10.3 (6.4, 14.2)||10.0 (8.8, 11.3)||22.5 (20.3, 24.8)||4.6 (3.3, 6.2)|
|AE-SDAI||9.0 (7.7, 10.3)||9.1 (4.9, 13.4)||9.0 (7.6, 10.3)||21.9 (19.4, 24.5)||4.2 (2.8, 5.9)|
|DAS28||28.3 (26.3, 30.4)||24.9 (22.7, 27.2)||39.4 (34.9, 43.8)||48.1 (45.2, 51.0)|
|PhGlobal VAS 0||4.7 (3.9, 5.6)||4.0 (3.1, 4.8)||7.8 (5.5, 10.1)||13.8 (11.8, 15.8)|
|PhGlobal VAS ≤1||19.1 (15.1, 17.7)||17.3 (15.5, 19.1)||25.1 (21.3, 29.0)||40.1 (37.4, 42.8)|
|PAS-II ≤1||9.2 (7.7, 10.6)||8.5 (6.8, 10.1)||11.5 (8.3, 14.6)||17.0 (14.9, 19.2)|
|MDA||22.9 (20.9, 24.8)||20.7 (18.5, 22.8)||29.9 (25.6, 34.1)||41.9 (39.0, 44.7)|
|DAS28||24.0 (22.4, 25.6)||19.4 (14.8, 24.1)||24.4 (22.7, 26.1)||48.4 (45.8, 51.0)|
|PhGlobal VAS 0||2.1 (1.6, 2.6)||1.9 (0.5, 3.3)||2.2 (1.6, 2.7)||6.7 (5.4, 8.1)|
|PhGlobal VAS ≤1||20.8 (19.2, 22.4)||24.8 (18.8, 30.7)||20.5 (18.8, 22.1)||43.4 (40.7, 46.1)|
|PAS-II ≤1||9.1 (8.0, 10.2)||11.7 (7.7, 15.7)||8.9 (7.8, 10.0)||21.8 (19.7, 23.9)|
|MDA||21.3 (19.6, 23.0)||23.2 (17.0, 29.3)||21.1 (19.3, 22.9)||39.6 (37.0, 42.2)|
In order to explore whether the same patients would be classified as in remission by the different criteria, we assessed agreement between remission measures using kappa statistics and Jaccard coefficients (19). We used the interpretation of Landis and Koch (20) for thresholds of kappa values, in which <0 indicates no agreement, 0–0.20 is slight agreement, 0.21–0.40 is fair agreement, 0.41–0.60 is moderate agreement, 0.61–0.80 is substantial agreement, and 0.81–1.0 is almost perfect agreement. Both the kappa statistic and the Jaccard coefficient can be interpreted as percentages, in which kappa represents the proportion of agreement in classifications of remission by the different criteria after correction for chance, while the Jaccard coefficient represents the proportion of agreement in remission-positive classifications, after excluding paired assessments of remission-negative classifications. The utility of the Jaccard statistic lies in its ease of demonstrating the extent of clinically understandable agreement among criteria after exclusion of jointly remission-negative cases.
To examine the heterogeneity of physician assessments, we determined the median odds ratio (MOR), by performing multilevel analyses using the Stata XTMELOGIT procedure, with physicians' and patients' assessments modeled in separate random-effects equations (Table 3). We used the MOR to express examiner variance in applying the remission criteria (21, 22). The MOR quantifies differences (i.e., variance between examiners) by comparing the same covariates in patients but using data from 2 randomly chosen examiners. This procedure yields a distribution of ORs, with 1 OR for each comparison pair. The MOR is the median of this distribution of pairwise ORs; i.e., the MOR expresses how much (in median) the individual probability of achieving remission would increase if a patient were evaluated by a second examiner who has classified a higher proportion of patients as in remission, assuming that the same covariates are being compared between patients. If the MOR is 1, then there are no differences in the prevalence of remission between examiners. If there are considerable examiner differences, then the MOR is large. The measure is directly comparable to fixed-effects ORs, which makes quantification of examiner variance easier to appreciate in terms of the familiar ORs (23). Roughly, given that, for example, the proportion of patients in remission is 8%, MORs of 1.5 and 2.0 translate to probabilities of remission of ∼12% and ∼16%. MOR analyses have not been used previously to assess variance among groups of physicians.
|Cohort, comparison||Kappa statistic||Jaccard coefficient||Median odds ratio|
|AE-3 plus ESR||AE-3 plus CRP||AE-3||AE-4||AE-CDAI||AE-3 plus ESR||AE-3 plus CRP||AE-3||AE-4||AE-CDAI|
|AE-3 plus ESR||1.00||1.00||1.0|
|PtGlobal VAS ≤1||1.0|
|PhGlobal VAS ≤1||2.7|
|AE-3 plus ESR||1.00||1.00||1.8|
|AE-3 plus CRP||0.64||1.00||0.50||1.00||2.1|
|PtGlobal VAS ≤1||1.5|
|PhGlobal VAS ≤1||2.4|
In addition, we calculated the Harrell's c statistic to determine the predictor strength and discriminatory ability of individual and combined predictors for each set of criteria (see Appendix A, available on the Arthritis & Rheumatism web site at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131). Harrell's c can be interpreted as the area under the receiver operating characteristic curve (AUC).
P values less than 0.05 were considered statistically significant. All analyses were performed with Stata version 11.1.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
There were important and statistically significant differences in the clinical characteristics of the RA patients between the ARCK and VARA groups (Table 1). The VARA group consisted of a much higher proportion of men (90.9% versus 25.8%), and patients in this group were older (mean 65.1 years versus 59.3 years), had a longer duration of RA (mean 13.2 years versus 10.0 years), had higher ESR values, were more often rheumatoid factor positive, and were more often treated with prednisone (42.1% versus 26.0%), as compared with patients in the ARCK group. In addition, the VARA group had higher swollen and tender joint counts. In contrast, clinical disease activity measures, including the HAQ-II DI/MDHAQ scores, VAS pain scores, PtGlobal VAS scores, PhGlobal VAS scores, and PAS-II scores were higher in the ARCK group. Moreover, values for the combined tender and swollen joint counts and combined disease activity measures were significantly different between the groups.
Probabilities of remission defined by the ACR/EULAR criteria sets and other criteria.
The probability of remission classified using the AE-3 at a given clinic visit was 7.5% (95% confidence interval [95% CI] 6.4, 8.7%) in the ARCK group and 8.9% (95% CI 7.9, 9.9%) in the VARA group. When the other ACR/EULAR remission criteria definitions were applied, probabilities of remission varied in both groups, from 5.0% to 6.9% in ARCK and 5.0% to 10.1% in VARA (Table 2). In the VARA data set, we were also able to examine probabilities of remission classified by the AE-3 plus CRP, which yielded a probability of 7.0% (95% CI 5.9, 8.0%), and the AE-SDAI, which yielded a probability of 9.0% (95% CI 7.7, 10.3%); both of these measures were recommended by the ACR/EULAR joint committee for use in clinical trials.
In addition, the probability of remission generally increased over the course of the study. Adjusted for age, sex, and use of prednisone, methotrexate, and biologic agents, the annual increase in the probability of remission classified by each criteria set was as follows: in the ARCK group, a 0.9% increase per year (95% CI 0.2, 1.6%) using the AE-3, a 0.9% increase per year (95% CI 0.02, 1.5%) using the AE-3 plus ESR, and a 1.3% increase per year (95% CI 0.6, 1.9%) using the AE-CDAI; in the VARA group, a 0.7% increase per year (95% CI 0.2, 1.1%) using the AE-3, a 0.1% increase per year (95% CI −0.1, 1.3%) using the AE-3 plus ESR, a 0.7% increase per year (95% CI 0.2, 1.1%) using the AE-3 plus CRP, a 1.1% increase per year (95% CI 0.5, 1.7%) using the AE-CDAI, and a 0.6% increase per year (95% CI −0.06, 1.2%) using the AE-SDAI. Neither methotrexate use nor use of biologic agents was significantly associated with an increase in remission in any model examined.
Regardless of which remission criteria set was selected, the cumulative probability of a remission classification by any of the criteria, i.e., the probability of ever meeting the definitions of remission, was considerably higher than the cross-sectional probability in both groups, ranging from 13.6% to 17.8% in the ARCK patients and from 16.5% to 24.4% in the VARA patients. The cumulative probabilities were determined over a mean followup of 2.2 years for the ARCK group, with a mean duration between visits of 2.7 months (interquartile range [IQR] 1.1–3.1 months), and a mean followup of 2.1 years for the VARA group, with a mean duration between visits of 3.8 months (IQR 2.1–4.7 months). During this period of time, the mean number of clinic visits was 10.6 for ARCK and 7.2 for VARA.
Although the probability of ever meeting the definitions of remission by any of the criteria was greater than the probability of meeting the criteria at a given clinic visit, the probability of any patient being in remission at 2 or more visits (not necessarily contiguous) was considerably smaller. The probability of such events ranged from 1.8% to 3.0% in the ARCK patients and from 1.5% to 4.6% in the VARA patients. To the extent that meeting criteria at least twice defines a meaningful remission, these probabilities might be used to further clarify the probability and meaning of remission in RA.
For those patients achieving remission, we assessed the probability of remaining in remission (Table 4). At 12 months after the start of remission, 19.7–33.8% of the patients in both groups remained in remission. At 2 years, the probability of remaining in remission ranged from 6.0% to 14.1%. In contrast, the probability of remaining in remission according to the MDA classification was 42.9% at 12 months and 24.8% at 24 months in the ARCK group, and 43.9% at 12 months and 22.0% at 24 months in the VARA group. To put the remission data into perspective, fewer than 3% of all RA patients can be expected to experience a remission lasting 2 years or longer. Figure 1 demonstrates representative Kaplan-Meier survival curves for remaining in remission.
|Probability of remaining in remission (95% CI)|
|At 3 months||At 12 months||At 24 months|
|AE-3 plus ESR||82.2 (74.7, 87.7)||33.8 (25.8, 42.0)||10.8 (5.8, 17.5)|
|AE-3||82.0 (75.3, 87.1)||33.2 (26.0, 46.6)||14.1 (9.0, 20.5)|
|AE-4||79.2 (72.6, 84.4)||22.0 (16.2, 28.5)||7.0 (3.8, 11.7)|
|AE-CDAI||71.3 (65.5, 76.4)||21.8 (16.8, 27.1)||6.0 (3.3, 10.0)|
|MDA||83.8 (79.7, 87.2)||42.9 (37.7, 48.0)||24.8 (20.1, 29.8)|
|AE-3 plus ESR||85.0 (79.3, 89.3)||19.7 (14.4, 25.7)||6.6 (3.5, 11.1)|
|AE-3 plus CRP||86.3 (80.9, 90.3)||24.7 (18.9, 30.9)||8.1 (4.6, 12.9)|
|AE-3||85.5 (81.0, 89.0)||24.2 (19.3, 29.4)||8.3 (5.3, 12.3)|
|AE-4||90.7 (85.6, 94.0)||23.8 (17.8, 30.4)||9.6 (5.6, 14.9)|
|AE-CDAI||89.0 (84.2, 92.4)||27.1 (21.3, 33.3)||13.5 (9.0, 18.8)|
|AE-SDAI||89.6 (84.2, 93.3)||31.3 (24.3, 38.4)||13.3 (8.4, 19.4)|
|MDA||89.5 (86.1, 92.1)||43.9 (38.7, 49.0)||22.0 (17.5, 26.8)|
We also calculated probabilities of remission classified by non-ACR/EULAR definitions (Table 2). The majority of these definitions resulted in remissions substantially higher than those achieved under the ACR/EULAR criteria. In particular, DAS28-defined remission was observed in 28.3% of patients in the ARCK group and 24.0% in the VARA group. PAS-II–defined remission, which was determined by patient self-report, was observed in 9.2% of patients in ARCK and 9.1% of patients in VARA. Finally, the MDA criterion for remission was satisfied by 22.9% of patients in ARCK and 21.3% of patients in VARA.
Agreement among criteria.
Similar cross-sectional probabilities of remission do not necessarily mean that the same patients are identified by the different criteria. To investigate agreement between criteria, we selected a random observation for each patient and then applied kappa and Jaccard statistics. In the VARA cohort, the AE-SDAI and AE-3 plus CRP criteria had a Jaccard coefficient of agreement of 0.66 (Table 3). The best Jaccard coefficient of agreement was between the AE-SDAI and the AE-CDAI, with a coefficient of 0.80 (Table 3), as might be suspected because of the similarity of these criteria. The best Jaccard agreement with the AE-3 plus CRP was with the AE-3 criteria (coefficient of 0.75). In the ARCK cohort, which lacked data on the CRP, the Jaccard coefficient of agreement between the AE-3 and AE-3 plus ESR was 0.80, while that for the AE-4 and AE-3 was 0.64 and that for the AE-CDAI and AE-3 was 0.45 (Table 3).
The same pattern was noted when agreement between the criteria was assessed using kappa statistics, with generally moderate or substantial agreement beyond chance. In the VARA group, the kappa agreement for the AE-SDAI and AE-3 plus CRP was 0.77, that for the AE-SDAI and AE-3 was 0.75, and that for the AE-3 and AE-3 plus CRP was 0.84. Because of the interest in pure patient-based criteria, we evaluated agreement between the PAS-II/RAPID-3 and DAS28 measures. In the VARA group, the kappa and Jaccard coefficients of agreement between RAPID-3–defined remission and AE-SDAI–defined remission were 0.46 and 0.35, respectively, those for RAPID-3–defined remission and AE-3 plus CRP–defined remission were 0.40 and 0.29, respectively, those for DAS28–defined remission and AE-SDAI–defined remission were 0.40 and 0.32, respectively, and those for DAS28-defined remission and AE-3 plus CRP–defined remission were 0.33 and 0.25, respectively.
We also examined the relationship between the various remission criteria and the MDA definition of remission, by determining the percent of patients given a positive classification by the ACR/EULAR criteria and a positive classification by the MDA criterion (double-positive), and the percent of patients given a positive classification by the ACR/EULAR criteria but a negative classification by the MDA criterion (positive–negative). For each ACR/EULAR criteria set, these percentages of double-positive/positive–negative patients were as follows: in the ARCK group, 32.2%/1.1% using the AE-3, 26.7%/0.6% using the AE-3 plus ESR, and 33.2%/3.0% using the AE-CDAI; in the VARA group, 36.6%/3.9% using the AE-3, 22.9%/2.0% using the AE-3 plus ESR, 27.4%/3.2% using the AE-3 plus CRP, 42.3%/1.7% using the AE-CDAI, and 37.0%/1.4% using the AE-SDAI.
Importance of individual predictors.
We calculated Harrell's c to determine the predictor strength and discriminatory ability of individual and combined predictors of remission for each set of criteria (see Appendix A, available on the Arthritis & Rheumatism web site at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131). The PtGlobal VAS score was the variable with the best discriminatory ability, with an AUC value as high as 0.97. In criteria that contained the PhGlobal VAS score, the AUC ranged from 0.86 to 0.93. Together, these variables dominate the predictors in terms of discriminatory ability. In contrast, the TJC and SJC had AUC values between 0.74 and 0.77. Even when considered simultaneously, the TJC and SJC had Harrell's c statistics for predictor strength and discriminatory ability that were lower than those for the global assessments of disease activity.
Effect of patient's global assessment on remission- positive and -negative states.
Since the PtGlobal VAS score was the strongest contributor to classification of criteria positivity, we examined the PtGlobal VAS scores graphically in patients who met the joint and ESR criteria for AE-3 plus ESR remission (Figure 2A). Among patients who would otherwise be considered positive for remission by these ACR/EULAR criteria, there was a wide distribution of PtGlobal VAS scores, including many within 1 point of the PtGlobal VAS score criterion for remission. As shown in Figure 2B, among the patients who were considered remission positive by these ACR/EULAR criteria at the previous clinic visit, there were many who were no longer considered remission positive on the basis of slight changes in the PtGlobal VAS score (score >1.0).
Effect of physician differences on criteria positivity.
We addressed the issue of whether physicians differed in their examinations and ratings by performing multilevel analyses and calculating the MORs to assess the degree of variation between examiners (Table 3). In this analysis, patients are nested within physicians. The highest MORs for the criteria components was found for the SJC (MORs 2.0–2.7) and for the PhGlobal VAS score (MORs 2.4–2.7), which indicates that there was considerable heterogeneity in the physician assessments. Slightly less heterogeneity was seen for the TJC (MORs 1.7–2.0). When variance was assessed according to specific criteria, a physician bias was noted generally for each component in the VARA data set, whereas in the ARCK data set, a bias was noted only when the AE-4 and AE-CDAI criteria sets were applied. These findings indicate that physician differences influence the remission diagnosis.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
The ACR/EULAR recently established criteria for remission in RA that were “stringent but achievable and could be applied uniformly in clinical trials” (6). Among the 2 definitions put forth were 1) scores of ≤1 for the TJC, SJC, CRP level (mg/dl), and PtGlobal VAS score, and 2) an SDAI ≤3.3. The group also suggested possibilities for criteria that might be used in clinical practice. The basis of this suggestion required “that a definition of remission be developed for clinic-based practice that would not require an acute-phase reactant, as long as it would capture remission as stringently as the measure employed for clinical trials” (6). Thus, the ACR/EULAR joint committee suggested that Boolean-based definitions of remission, comprising the TJC, SJC, and patient's global assessment of disease activity, could provide statistical results for classification of remission similar to those encompassing the CRP and the CDAI. The committee indicated that such definitions of remission could be applied in clinical practice “until better measures for that purpose become available.” In addition, the committee suggested cutoff points for the ESR (<20 mm/hour for men and <30 mm/hour for women) for defining remission in clinical practice, in the event that laboratory tests were performed in the clinical practice setting.
The central difference between remission in clinical trials (and observational research) and remission in clinical practice is that trial data refer to a group of patients, whereas clinical practice remission refers to an individual patient and an individual examiner. In a trial in which different definitions provide similar proportions of patients in remission, it does not matter substantially which valid definition is used. In clinical practice, however, if different patients are identified by different criteria, it may matter a great deal.
It is not surprising that remission probability differs according to the remission definition being applied. However, differences seem generally small and in accord with probabilities from clinical trials noted in the ACR/EULAR report on remission criteria in RA (6). Where our results differ, perhaps conceptually, from the ACR/EULAR results is in our observation of the tenuousness and sporadic nature of remission. The ACR/EULAR joint committee regarded the duration of remission as worthy of a separate study, and it was noted that this was not addressed in the primary report (6). We observed that within 12 months, 65–80% of those patients who had experienced remission no longer met remission criteria; at 24 months, 6–14% still met remission criteria (Table 4 and Figure 1). If, as indicated by the data in Table 2, the probability of ever being in remission is 13.0–24.4%, then the probability of being in remission for as long as 2 years is between 1.0% and 3.0%. These results are remarkably similar to those in the study by Wolfe and Hawley in 1985 (24), who noted that 18.1% of 458 patients in a clinical practice were classified as being in remission after application of the ACR 1981 remission criteria for RA (3). In addition, they found that “only 15% of remissions lasted longer than 24 months.” Thus, only 3% of patients had a remission that lasted as long as 2 years.
Another indication of the potential tenuous nature of remission comes from our observation that 1.5–4.6% of RA patients had 2 or more physician visits in remission, compared to 13.0–24.4% of patients who ever experienced a remission visit (Table 2).
We addressed several issues with respect to misclassification. First, we examined the degree of agreement among the different criteria. In the current study, the Jaccard coefficient of agreement between the AE-SDAI and AE-3 plus CRP criteria, which are the 2 criteria sets recommended for use in clinical trials, was 0.66 in the VARA group. When the ACR/EULAR-recommended clinical criteria (AE-3) was assessed for agreement with the AE-SDAI and AE-3 plus CRP, Jaccard coefficients of 0.63 and 0.75, respectively, were noted. These levels of agreement, as well as the kappa values given in Table 3, are sufficient for clinical trials. However, at the level of the individual patient, clinically significant misclassification can occur, underscoring the difference between group criteria and individual criteria with respect to levels of reliability.
Misclassification will also occur if physician examiners differ in their ratings. Reliability has been defined as “the degree to which patients can be distinguished from each other, despite measurement error” (25). High reliability is important in terms of its discriminative ability for purposes such as diagnostic applications (e.g., distinguishing patients with more severe disease from those with less severe disease). In general, reliability coefficients of ≥0.9 are required to make decisions about individual patients. Values from 0.80 to 0.89 represent good reliability, suitable for research and use in groups of patients. However, there is substantial evidence to indicate that interrater reliability is poor with respect to the examination of tender and swollen joints (26–29). Using the MOR in multilevel analyses, we also found evidence of important physician heterogeneity in the tender and swollen joint examination and in the physician global health assessment rating. For example, MORs of 2.0 (Table 3) indicate that the probability of remission can vary 2-fold according to physician examiner, irrespective of the degree of disease activity. Such rater variability is not likely to be a problem in clinical trials, unless there is a systematic bias. At the clinical level, however, physician differences can lead to misclassification.
The sole patient-specific measure used in the ACR/EULAR criteria sets is the PtGlobal VAS score. As shown in Appendix A (on the Arthritis & Rheumatism web site at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131), the PtGlobal VAS score has the highest Harrell's c statistic and is the best discriminatory variable among the components of the various remission definitions. Lassere et al have shown that the PtGlobal VAS score has poor test–retest reliability (intraclass coefficient of 0.75) at the level of the individual patient (27). This finding is consistent with the data in the current study. Figure 2A suggests that when remission is first identified, many patients with a PtGlobal VAS score close to the remission cutoff level do not satisfy the remission definition. Moreover, when we examined our results in the next visit for patients who had been classified as in remission (Figure 2B), it could be seen that many who were previously classified as in remission were no longer considered in remission because of changes in the PtGlobal VAS score. Thus, remission in this setting depends on the patient's global assessment, which may reflect true sensitivity to changes in RA activity or may represent issues of data reliability in which the remission status changes while RA activity actually remains the same.
One approach that avoids physician bias is the use of the RAPID-3 (or PAS-II) score to assess remission (4). However, the individual components of these scales—VAS pain scores, global VAS scores, and HAQ scores—also have poor reliability (27). In addition, we found unsatisfactory agreement when ACR/EULAR-recommended measures were applied, including a Jaccard coefficient of 0.35 for SDAI-based definitions of remission and 0.29 for AE-3 plus CRP–defined remission.
We note a number of potential limitations to our study. Among the possible methodologic concerns is that we chose to analyze individual physicians in the MOR analyses in the VARA group. We did this to be consistent with analyses of ARCK patients. Another approach would have been to analyze VARA sites rather than physicians. We found no substantial difference when we substituted sites for physicians in sensitivity analyses (results not shown).
We did not attempt to discern reasons for the differences in results between the VARA and ARCK data sets, as that was not the purpose of our study. Patient mix and sociodemographic characteristics might explain some of the differences. In multivariate analyses of remission criteria, we noted that men were more likely than women to achieve remission when the AE-CDAI definition was applied in the ARCK cohort, but this was not the case in either cohort when any of the other criteria (as described in Table 2) were applied.
Although we attributed high MOR scores (MORs >1) to physician differences, it is possible that some physicians were assigned patients with greater disease activity. That does not appear to have been a matter of policy in the ARCK or VARA registry, and we found no evidence to support that possibility.
In summary, the proportion of patients classified as being in remission at a given visit ranged from 5.0% to 10.1%, and these values ranged from 7.5% to 8.9% when the 3-variable criteria recommended by the ACR/EULAR joint committee were applied. During the ∼2.2 years of followup, 18.4–24.4% of patients achieved remission as classified by the AE-3 criteria. Prolonged remissions were rare, with <3% of patients experiencing a remission lasting as long as 2 years.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Wolfe had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Shahouri, Michaud, Wolfe.
Acquisition of data. Shahouri, Michaud, Mikuls, Shaver, Anderson, Weidensaul, Busch, Wang, Wolfe.
Analysis and interpretation of data. Shahouri, Michaud, Caplan, Wolfe.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
We thank Grant Cannon, MD, for his helpful and thoughtful comments.
- Top of page
- PATIENTS AND METHODS
- AUTHOR CONTRIBUTIONS
- 1Clinical and statistical study of chronic arthritis based on 1100 cases. Am J Med Sci 1927; 173: 31–6., .
- 2Chronic atrophic arthritis. Ann Intern Med 1938; 11: 1792–805., , .
- 13Toward a multidimensional Health Assessment Questionnaire (MDHAQ): assessment of advanced activities of daily living and psychological status in the patient-friendly health assessment questionnaire format. Arthritis Rheum 1999; 42: 2220–30., , .
- 15An index of only patient-reported outcome measures, routine assessment of patient index data 3 (RAPID3), in two abatacept clinical trials: similar results to disease activity score (DAS28) and other RAPID indices that include physician-reported measures. Rheumatology (Oxford) 2008; 47: 345–9., , , , , .
- 17Intra-class correlation in random-effects models for binary data. Stata Journal 2003; 3: 32–46., .
- 19Comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles 1901; 37: 547–79. [In French]..
- 26Limitations of a quantitative swollen and tender joint count to assess and monitor patients with rheumatoid arthritis. Bull Hosp Jt Dis 2008; 66: 216..
- 27Reliability of measures of disease activity and disease damage in rheumatoid arthritis: implications for smallest detectable difference, minimal clinically important difference, and analysis of treatment effects in randomized controlled trials. J Rheumatol 2001; 28: 892–903., , , , .