Dr. Raza has received speaking fees and honoraria from Wyeth and UCB Celltech (less than $10,000 each).
Rheumatoid Arthritis Clinical Studies
Validation of a prediction rule for disease outcome in patients with recent-onset undifferentiated arthritis: Moving toward individualized treatment decision-making
Article first published online: 30 JUL 2008
Copyright © 2008 by the American College of Rheumatology
Arthritis & Rheumatism
Volume 58, Issue 8, pages 2241–2247, August 2008
How to Cite
van Der Helm-van Mil, A. H. M., Detert, J., Cessie, S. L., Filer, A., Bastian, H., Burmester, G. R., Huizinga, T. W. J. and Raza, K. (2008), Validation of a prediction rule for disease outcome in patients with recent-onset undifferentiated arthritis: Moving toward individualized treatment decision-making. Arthritis & Rheumatism, 58: 2241–2247. doi: 10.1002/art.23681
- Issue published online: 30 JUL 2008
- Article first published online: 30 JUL 2008
- Manuscript Accepted: 14 APR 2008
- Manuscript Received: 31 OCT 2007
- European Community 6th Framework Programme, AutoCure Project
- Karl-Wilder-Stiftung, Germany
- Arthritis Research Campaign, UK
The decision to start disease-modifying antirheumatic drugs in patients with recent-onset undifferentiated arthritis (UA) is complicated by a varied natural disease course in which the disease in one-third of patients progresses to rheumatoid arthritis (RA), whereas 40–50% of patients experience spontaneous remission. Recently, a prediction rule was developed to estimate the chance of progression to RA in individual patients presenting with UA. This study investigates the accuracy of this prediction rule in independent cohorts of patients with UA.
In 3 cohorts of patients with recent-onset UA, from the UK, Germany, and The Netherlands, the prediction score and the corresponding chance of developing RA were calculated. These data were compared with the observed disease outcome after ≥1 year of followup. Positive predictive values (PPVs) and negative predictive values (NPVs) were calculated and the overall discriminative ability of the prediction rule was assessed using area under the receiver operating characteristic curves (AUCs).
Since data on the severity of morning stiffness were not available in all validation cohorts, the prediction rule was rederived with the duration of morning stiffness as a substitute. The AUC for this rule was 0.88 (SEM 0.015). For each validation cohort, the AUC was 0.83 (SEM 0.041), 0.82 (SEM 0.037), and 0.95 (SEM 0.031) in the British, German, and Dutch cohorts, respectively. The NPV (for a prediction score ≤6) in these 3 cohorts was 83%, 83%, and 86%, respectively; the PPV (for a prediction score ≥8) was 100%, 93%, and 100%, respectively.
The recently derived prediction rule, when applied to 3 independent cohorts of patients with UA, has an excellent discriminative ability for assessing the likelihood of progression to RA. Application of this rule will allow individualized treatment decision-making for patients with UA.
The effective management of patients with recent-onset undifferentiated arthritis (UA) is difficult. Early initiation of methotrexate has been shown to be effective in slowing progression to rheumatoid arthritis (RA) and in reducing the level of joint damage at a group level (1). However, the rate of spontaneous remission in early UA is considerable (40–50%) and only one-third of patients with UA will develop RA (2–4). Thus, the ability to accurately predict outcome at an individual patient level, to allow accurate individualized treatment decision-making, is an important goal.
Recently, a prediction rule was developed as a way to address the problems of undertreatment (delayed treatment in patients with UA whose disease will progress to RA) and overtreatment (treatment with potentially toxic drugs in patients whose synovitis will remit spontaneously) (5). This prediction rule can be used to estimate the chance of progression to RA in individual patients with recent-onset UA (5). Currently, application of this prediction rule in clinical practice is hampered by a lack of validation. Therefore, this study was undertaken to assess the accuracy of the recently developed prediction rule in independent cohorts of patients with recent-onset UA.
PATIENTS AND METHODS
Rederiving the prediction rule.
Three different cohorts of patients with recent-onset UA were used for validation. In 2 of these cohorts, data on the baseline parameter of severity of morning stiffness (measured on a visual analog scale [VAS]) were not available, but the duration of morning stiffness (in minutes) was recorded in all 3 cohorts. Therefore, the prediction rule was rederived, using data from the original derivation cohort (the Leiden Early Arthritis Clinic [EAC] cohort), with the duration of morning stiffness as a substitute. The negative predictive values (NPVs) and positive predictive values (PPVs) as well as the area under the receiver operating characteristic (ROC) curves (AUCs) for this adjusted rule were assessed.
Patients with recent-onset UA from 3 separate cohorts were studied. The first cohort consisted of 99 patients with UA recruited to the Birmingham, UK Early Arthritis cohort. For this cohort of patients with very early arthritis, patients were included if they had synovitis in at least 1 joint and a duration of symptoms (inflammation-related joint pain, swelling, or morning stiffness) of ≤3 months. This British cohort has been described in detail previously (6). Patients were followed up for at least 18 months and were classified as having RA if they fulfilled the American College of Rheumatology (ACR; formerly, the American Rheumatism Association) 1987 criteria for RA (7).
The second cohort consisted of 155 patients from Berlin, Germany recruited to the Berlin EAC. This clinic was started in January 2004, and patients were included if they had synovitis in at least 2 joints and a duration of symptoms of between 4 weeks and 12 months. This Berlin cohort has been described previously (8). Fulfillment of the ACR criteria for RA (7) was assessed after 1 year of followup.
The third validation cohort consisted of 34 Dutch patients included in the placebo arm of the PRObable Rheumatoid Arthritis: Methotrexate Versus Placebo Treatment (PROMPT) trial (a double-blind, placebo-controlled, randomized trial in which patients with recent-onset UA were treated with either methotrexate or placebo) (1), for whom followup data were available. These patients were not included in the initial Leiden EAC cohort that had been used to derive the original prediction rule (5). However, these placebotreated patients with UA were used in a prior study to validate the originally derived prediction rule. Since the prediction rule was rederived, and the parameter of duration of morning stiffness was used to replace severity of morning stiffness in the adjusted prediction rule, this group of patients from the placebo arm of the PROMPT trial was again used for validation of the rederived rule.
All studies were approved by the local ethics committees. All patients gave their written informed consent to participate in the studies.
Results are reported as the mean ± SD or, in situations in which the distributions were skewed, as the median and interquartile range. Differences in mean values between groups were analyzed with the Mann-Whitney U test for comparison of 2 groups and the Kruskal-Wallis test for comparison of 3 groups. Proportions were compared using the chi-square test. The derivation of the prediction rule with the duration of morning stiffness as a substitute for the severity of morning stiffness was performed with logistic regression analysis. To get a simplified prediction rule, the regression coefficients of the predictive variables were rounded to the nearest number ending in .5 or .0, which resulted in a weighted score.
For each patient in the 3 cohorts, a prediction score was calculated using the patients' characteristics at baseline as parameters (which were recorded at the time of patient recruitment to the cohorts, before the investigators were aware of which parameters would constitute the prediction rule under investigation). The prediction score and actual outcomes were compared.
The PPVs and NPVs were determined for several cutoff values of the prediction score. An ROC curve was constructed to evaluate the diagnostic performance of the prediction rule, and the AUC provided a measure of the overall discriminative ability of the prediction rule. This was done for the 3 validation cohorts separately and for all 288 patients pooled together (combined cohort). SPSS software, version 12.0 (Chicago, IL), was used for all data analyses. P values less than 0.05 were considered significant.
Validation of the rederived prediction rule.
The prediction rule was rederived using data from the Leiden EAC cohort, with the parameter of duration of morning stiffness substituting for severity of morning stiffness. Duration of morning stiffness was found to be a less powerful predictor than severity of morning stiffness, and the maximal prediction score for duration of morning stiffness was adjusted to 1 (compared with a maximal score of 2 in the original prediction rule). Consequently, the maximal total prediction score would be 13 instead of 14 (Figures 1A and B).
The observed frequency of progression to RA in relation to the calculated prediction scores (rounded to the nearest integer) is shown in Table 1. In the initial study, the NPVs and PPVs were calculated for several cutoff values of the prediction score (5). A prediction score with a cutoff ≤6.0 had a high NPV, while a cutoff value ≥8.0 had a high PPV; 25% of patients were in the intermediate group (scores between 6.0 and 8.0) for whom no accurate prediction could be made. Using lower and higher cutoff values resulted in increased NPVs and PPVs, respectively, but also resulted in a larger intermediate group. Therefore, the NPVs and PPVs of the rederived prediction score were assessed with the cutoff values ≤6.0 and ≥8.0.
|Prediction score||Initial cohort||Combined validation cohort|
|1||20/20 (100)||0||12/12 (100)||0|
|2||47/47 (100)||0||19/21 (90)||2/21 (10)|
|3||74/77 (96)||3/77 (4)||34/36 (94)||2/36 (6)|
|4||75/82 (91)||7/82 (9)||36/43 (84)||7/43 (16)|
|5||60/71 (85)||11/71 (16)||30/45 (67)||15/45 (33)|
|6||58/90 (64)||32/90 (36)||27/39 (69)||12/39 (31)|
|7||24/60 (40)||36/60 (60)||19/41 (46)||22/41 (54)|
|8||19/58 (33)||39/58 (67)||6/20 (30)||14/20 (70)|
|9||6/26 (23)||20/26 (77)||1/17 (6)||16/17 (94)|
|10||3/20 (15)||17/20 (85)||0||10/10 (100)|
|11||0||9/9 (100)||0||3/3 (100)|
Eighty-nine percent of patients with a score ≤6.0 did not develop RA (compared with 91% from the original prediction rule). Eighty-two percent of patients with a score ≥8.0 had early UA that progressed to RA (compared with 84% from the original prediction rule). The AUC after application of the rederived prediction rule was 0.88 (SEM 0.015), which was slightly lower than that from the original prediction rule (AUC 0.89, SEM 0.014). Thus, the rederived prediction rule had a lower diagnostic performance, but the difference compared with that of the originally derived rule was marginal.
Application of the rederived prediction rule in the validation cohorts.
Baseline characteristics of the patients with early UA are presented in Table 2. Consistent with the different inclusion criteria used in the cohorts, the duration of symptoms differed between the 3 cohorts, with the shortest symptom duration in the Birmingham cohort (mean 41 days) and the longest symptom duration in the Dutch cohort (mean 327 days). The 3 cohorts differed with regard to the patient characteristics that constituted the prediction rule; consequently, the total prediction scores were different for the 3 groups (P = 0.016 by Kruskal-Wallis test). The percentage of patients whose disease progressed to RA was 31% in the Birmingham cohort, 37% in the Berlin cohort, and 44% in the Dutch cohort.
|Birmingham, UK (n = 99)||Berlin, Germany (n = 155)||Dutch PROMPT (n = 34)|
|Age, years||48.2 ± 16.4||50.8 ± 14.8||51.6 ± 12.4|
|Female, no. (%)||60 (61)||113 (73)||28 (82)|
|Symptom duration, days||41 ± 25||131 ± 96||327 ± 198|
|No. of tender joints||5.4 ± 6.8||7.6 ± 7.6||6.8 ± 6.1|
|No. of swollen joints||3.3 ± 3.5||3.5 ± 5.2||3.1 ± 6.7|
|Involved joints, no. (%)|
|Symmetric||40 (40)||89 (57)||12 (35)|
|Small joints||49 (49)||110 (71)||28 (82)|
|Upper extremities||40 (40)||110 (71)||28 (82)|
|Upper and lower extremities||16 (16)||97 (63)||13 (38)|
|Duration of morning stiffness, minutes||66.0 ± 76.4||24.2 ± 45.5||44.4 ± 57.0|
|CRP, median (IQR) mg/liter||23.0 (7.0–54.0)||6.8 (2.1–18.3)||3.0 (3.0–6.0)|
|RF positive, no. (%)||17 (17)||72 (46)||11 (32)|
|Anti-CCP positive, no. (%)||12 (12)||35 (23)||8 (24)|
|Total prediction score||4.7 ± 2.3||5.6 ± 2.3||5.7 ± 2.2|
|Progression to RA, no. (%)||31 (31)||58 (37)||15 (44)|
In the Birmingham cohort, 54 of 65 patients with a prediction score ≤6.0 (83%) did not develop RA, whereas all 7 patients with a score ≥8.0 (100%) had disease that progressed to RA. Twenty-seven of the 99 patients in the Birmingham cohort (27%) had a score in the intermediate range, between 6.0 and 8.0. The AUC for this cohort was 0.83 (SEM 0.041) (Table 3).
|Leiden EAC (n = 570)†||Validation cohort|
|Birmingham, UK (n = 99)||Berlin, Germany (n = 155)||Dutch PROMPT (n = 34)||Validation cohorts combined (n = 288)|
|NPV of score ≤6.0‡||89||83||83||86||83|
|PPV of score ≥8.0‡||82||100||93||100||97|
|Patients with score 6.0–8.0||24||27||22||24||24|
|AUC (SEM)||0.88 (0.015)||0.83 (0.041)||0.82 (0.037)||0.95 (0.031)||0.84 (0.024)|
In the Berlin cohort, 78 of 94 patients with a score ≤6.0 (83%) did not show progression to RA, whereas 25 of 27 patients with a score ≥8.0 (93%) were diagnosed as having RA, and 34 of the 155 patients (22%) had a score between 6.0 and 8.0. The AUC for this cohort was 0.82 (SEM 0.037) (Table 3).
In the Dutch validation cohort, the NPV was 86% (i.e., 18 of 21 patients with a score ≤6.0 showed no progression to RA), the PPV was 100% (all 5 patients with a score ≥8.0 developed RA), and 8 of the 34 patients in the Dutch cohort (24%) had an intermediate score. The AUC for this cohort was 0.95 (SEM 0.031) (Table 3).
Combining all patients from the 3 cohorts (n = 288) resulted in a PPV of 97% for patients with a prediction score ≥8.0, and yielded an NPV of 83% for patients with a score ≤6.0 (Table 3). The relationship between the prediction scores and the percentages of patients in the combined validation cohort who developed RA is shown in Table 1. The diagnostic performances of the prediction rule, visualized as ROC curves, in the derivation cohort as well as in the 3 validation cohorts combined are presented in Figure 2. The AUC calculated on the pooled validation cohorts was 0.84 (SEM 0.024).
We also compared the predictive performance of the rederived prediction rule with the predictive performance of seropositivity for antibodies against cyclic citrullinated peptide (anti-CCP) and positivity for rheumatoid factor (RF) (both of which are characteristics that are part of the prediction rule). The chance of developing RA in the presence of these autoantibodies was determined in the 3 validation cohorts combined (n = 288).
Of the 47 patients who were seropositive for both anti-CCP and RF, 45 subjects developed RA (PPV 95.7%). Twenty-six of the 66 patients who were seropositive for anti-CCP or RF (39.4%) showed progression to RA, and the chance of developing RA in patients who were negative for both autoantibodies was 18.9% (33 of 175 patients). The overall discriminative ability of the test findings for the 2 autoantibodies was determined using an ROC curve, which yielded an AUC of 0.73 (SEM 0.033). This ROC curve as well as the ROC curve for the prediction rule are presented in Figure 3.
The present study assessed the predictive accuracy of a recently derived prediction rule that estimates the chance of progression to RA in 3 independent cohorts of patients with early UA. In all 3 validation cohorts, the PPVs and NPVs, as well as the AUCs, were only marginally lower than those in the derivation cohort. The accurate prediction of the development of RA in several independent cohorts of patients with early UA, originating from different countries, demonstrates the discriminative ability and the validity of the prediction rule and provides the foundation for the use of this rule in clinical practice.
One of the 2 most predictive variables in the original prediction rule was the severity of morning stiffness, as measured on a VAS. The severity of morning stiffness was not recorded in either the Birmingham or Berlin cohorts, but the duration of morning stiffness was measured in all 3 cohorts. In the derivation cohort, data on the duration and the severity of morning stiffness were both available, and the severity of morning stiffness was a better predictor than its duration, a finding that is consistent with that reported in the literature (9–11). To enable validation, the prediction rule was rederived using the duration of morning stiffness. The accuracy of the original prediction rule and that of the rederived prediction rule were only slightly different (AUCs 0.88 and 0.89, respectively).
In addition to morning stiffness, the presence of the autoantibodies anti-CCP and RF are important predictors of outcome. Anti-CCP, in particular, is the strongest weighted predictor in the rule. To assess the additional benefit gained by incorporating data on all of the variables in the prediction rule, the discriminative ability of the prediction rule and of the autoantibody status were compared in the validation cohorts. Consistent with the findings in the literature (12), the presence of both antibodies conferred a high risk of progression to RA. This risk decreased considerably when only 1 or none of these antibodies was present. To compare the overall diagnostic performance, the AUC for the prediction model and that for the data on anti-CCP and RF were evaluated. The AUC for the prediction rule was higher than that for the data on CCP and RF (AUC 0.84 [SEM 0.024] versus 0.73 [SEM 0.033], respectively). The prediction rule thus provides a higher discriminative ability than does the autoantibody status. The fact that the prediction rule is easily determined may facilitate application of the prediction rule in clinical practice.
Unfortunately, with the current prediction rule, no adequate estimation of risk could be made in one-quarter of the patients (the patients with a score between 6 and 8). Importantly, the proportion of these patients was comparable in the derivation cohort and in all 3 validation cohorts. Data on radiologic joint destruction or on genetic risk factors for RA (HLA–DRB1 shared epitope alleles, PTPN22,C5-TRAF1) were studied in the derivation cohort and were not independent predictors of RA development, in logistic regression analysis. Therefore, these variables were of no additive value in assessing the patients with a score between 6 and 8. The identification of markers to facilitate the prediction of outcome in this group of patients remains an important research goal.
Misclassification may have occurred when patients who presented with UA were treated with a drug that may have slowed the rate of progression to RA. Patients whose natural history would have been disease progression to RA may, with treatment, not have accrued sufficient features to allow the classification of RA. Disease-modifying antirheumatic drugs (DMARDs) were started in 22% of the patients in the Birmingham cohort and 25% in the Berlin cohort whose disease did not progress to RA. In the Dutch replication cohort, no DMARDs were used. Such patient misclassification would mean that the predictive values of the current model and the AUC of this model were underestimates.
The different baseline characteristics of patients in the 3 cohorts may be due to different inclusion criteria, in particular the maximum permissible symptom duration at entry. However, these patient cohorts represent a broad cross-section of patients with early UA, and it is noteworthy that the prediction rule accurately estimated the disease outcome in all 3 cohorts.
The current prediction rule is, to our knowledge, the first validated rule for application in patients with early UA, and it should facilitate the development of personalized medicine in this clinical context. There is widespread interest in the development of predictive tools in other clinical situations. For example, instruments that predict the development of cardiovascular disease, diabetes, osteoporotic fractures, and hepatitis B virus–induced liver fibrosis have recently been described (13–16). Intriguingly, the discriminative ability, as measured by the AUC, of the prediction rule described in the present study is better than that of the predictive tools referred to in these other studies, most of which required additional or invasive measurements. In contrast, the information needed to use our prediction rule for early UA is easily and regularly collected at the first visit to the clinic.
In conclusion, a prediction rule for the development of RA in patients with UA has now been validated. It accurately estimates the risk of developing RA in more than 75% of individual patients with recent-onset UA. We hope that it will be of use in daily practice, leading to reductions in the under- and overtreatment of patients with UA.
Dr. van der Helm-van Mil had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Van der Helm-van Mil, Burmester, Raza.
Acquisition of data. Van der Helm-van Mil, Detert, Filer, Bastian, Burmester, Raza.
Analysis and interpretation of data. Van der Helm-van Mil, Detert, le Cessie, Burmester, Huizinga, Raza.
Manuscript preparation. Van der Helm-van Mil, Filer, Burmester, Huizinga, Raza.
Statistical analysis. Van der Helm-van Mil, le Cessie.