To develop a clinical model for the prediction, at the first visit, of 3 forms of arthritis outcome: self-limiting, persistent nonerosive, and persistent erosive arthritis.
To develop a clinical model for the prediction, at the first visit, of 3 forms of arthritis outcome: self-limiting, persistent nonerosive, and persistent erosive arthritis.
A standardized diagnostic evaluation was performed on 524 consecutive, newly referred patients with early arthritis. Potentially diagnostic determinants obtained at the first visit from the patient's history, physical examination, and blood and imaging testing were entered in a logistic regression analysis. Arthritis outcome was recorded at 2 years' followup. The discriminative ability of the model was expressed as a receiver operating characteristic (ROC) area under the curve (AUC).
The developed prediction model consisted of 7 variables: symptom duration at first visit, morning stiffness for ≥1 hour, arthritis in ≥3 joints, bilateral compression pain in the metatarsophalangeal joints, rheumatoid factor positivity, anti–cyclic citrullinated peptide antibody positivity, and the presence of erosions (hands/feet). Application of the model to an individual patient resulted in 3 clinically relevant predictive values: one for self-limiting arthritis, one for persistent nonerosive arthritis, and one for persistent erosive arthritis. The ROC AUC of the model was 0.84 (SE 0.02) for discrimination between self-limiting and persistent arthritis, and 0.91 (SE 0.02) for discrimination between persistent nonerosive and persistent erosive arthritis, whereas the discriminative ability of the American College of Rheumatology 1987 classification criteria for rheumatoid arthritis was significantly lower, with ROC AUC values of 0.78 (SE 0.02) and 0.79 (SE 0.03), respectively.
A clinical prediction model was developed with an excellent ability to discriminate, at the first visit, between 3 forms of arthritis outcome. Validation in other early arthritis clinics is necessary.
There is growing evidence that therapeutic intervention early in the disease course of rheumatoid arthritis (RA) leads to earlier disease control and less joint damage (1–6). Since treatment with disease-modifying antirheumatic drugs (DMARDs) is only justified when the risk:benefit or cost-effectiveness ratios are favorable, it is mandatory to be able to differentiate between RA and other forms of arthritis early after symptom development (7). The American College of Rheumatology (ACR; formerly, American Rheumatism Association) 1987 classification criteria for RA (8) have often been used as a diagnostic tool in patients with recent-onset arthritis. However, these criteria were developed in a population of patients selected according to their disease status, to classify rather than diagnose RA (8). Therefore, their diagnostic ability in early arthritis is probably not optimal.
A major problem to be dealt with in the diagnostic research of RA is the lack of an independent gold standard for the disease. In most studies, either the physician's clinical diagnosis or the disease classification according to the ACR criteria has been used as the gold standard for RA; however, these are dependent on the diagnostic tests that are evaluated. This leads to circularity and overestimation of the diagnostic properties of these tests (9). Defining the gold standard in terms of clinical outcome is a way of avoiding this circularity (10). Moreover, predicting the outcome of arthritis in a patient is more relevant for therapeutic decision-making than predicting whether an arthritis syndrome will ever satisfy a set of classification criteria.
The aim of the present prospective study was to develop diagnostic criteria for RA that are maximally accurate in a population of patients with early arthritis and, at the same time, usable in everyday clinical practice. Arthritis outcome recorded at 2 years' followup was used as the gold standard. A clinical model was developed, using logistic regression techniques, for discrimination at the first visit between 3 forms of arthritis outcome: self-limiting arthritis, persistent nonerosive arthritis, and persistent erosive arthritis.
Patients. In 1993, a special Early Arthritis Clinic (EAC) was started at the Department of Rheumatology of the Leiden University Medical Center, the only center for rheumatology in a health care region of >300,000 inhabitants. The general practitioners in the region were motivated to refer patients if at least 2 of the following features were present: joint pain, joint swelling, or reduction in joint mobility. All patients referred to the EAC were seen within 2 weeks by a rheumatologist. Patients were included in an early arthritis cohort if a rheumatologist confirmed the presence of arthritis in at least 1 joint and if the symptoms lasted <2 years. Second opinions were excluded. For the purpose of the present study, the 566 consecutive patients who were included between 1993 and December 1996 were used.
Baseline assessment. A standard diagnostic evaluation was performed at the first visit. The data recorded were potentially diagnostic variables obtained from the patient's history, physical examination, and laboratory and radiologic examinations. The variables selected were age, sex, duration of symptoms at first visit, duration of morning stiffness, family history of RA, Ritchie articular index (11) score, total number of swollen joints, arthritis in at least 3 joint areas, arthritis in the hand joints, arthritis in the large joints (shoulder, elbow, knee), symmetric arthritis, rheumatoid nodules, bilateral compression pain in the metacarpophalangeal (MCP) and/or metatarsophalangeal (MTP) joints, erythrocyte sedimentation rate (ESR), levels of C-reactive protein (CRP), IgM rheumatoid factor (RF) positivity (≥5 IU), antinuclear antibody (ANA) positivity, anti–cyclic citrullinated peptide antibody (anti-CCP) positivity (≥92 IU), shared epitope hetero- or homozygosity, DQRA hetero- or homozygosity, and the presence of erosions on radiographs of the hands and/or feet.
For the analysis, the duration of symptoms at the first visit was divided into 3 categories: shorter than 6 weeks, between 6 weeks and 6 months, and longer than 6 months. Assessed for the presence of arthritis were 54 joints, with a maximum swollen joint count of 22, since the MCP, the proximal and distal interphalangeal, and MTP joints each were counted as 1 joint. Arthritis was defined as symmetric when at least 1 of the joint groups was affected bilaterally. The IgM-RF and the anti-CCP antibodies were measured by enzyme-linked immunosorbent assays, as described previously (12, 13). ANA were determined by immunofluorescence (HEp-2 cell substrate). DNA isolation and HLA–DQ and DR typing were performed as described previously (14). The presence of HLA class II regions with RA predisposition, i.e., the shared epitope and DQRA, were determined as described previously (14, 15). The radiographs of the hands and feet were scored for the presence of erosions using the modified Sharp score (16, 17) by an experienced rheumatologist who was blinded to the patient's other data.
Followup assessment. All patients were followed up for at least 2 years, except for the patients with transient arthritis based on crystal-induced conditions and patients with transient arthritis classified as reactive arthritis, septic arthritis, sarcoid arthritis, or undifferentiated arthritis. Patients with transient arthritis based on crystal-induced conditions were not followed up on a routine basis. Patients with transient arthritis classified as reactive arthritis, septic arthritis, sarcoid arthritis, or undifferentiated arthritis who were free of symptoms and in natural remission at 1 year of followup were not followed up longer than 1 year. They were instructed to make contact with the outpatient clinic in the case of relapse of symptoms. When not symptom free or in remission at 1 year of followup, the patients classified as having reactive arthritis, septic arthritis, sarcoid arthritis, or undifferentiated arthritis were followed up for at least 2 years.
At the 1- and 2-year followup, the patients were evaluated by a rheumatologist, who took a structured history and performed a physical examination. The variables recorded were the number and location of swollen and tender joints (as at baseline) and treatment with DMARDs or corticosteroids. Radiographs of the hands and feet were obtained at yearly intervals in the patients with persistent arthritis, and were scored for the presence of erosions as described above.
Statistical analysis. Diagnostic criteria were developed to discriminate, at the first visit, between self-limiting arthritis, persistent nonerosive arthritis, and persistent erosive arthritis. The arthritis outcome recorded at 2 years' followup thereby represented the gold standard, as shown in Figure 1. A patient had self-limiting arthritis when a natural remission was present at 2 years' followup. Natural remission was defined as no arthritis on examination in a patient who had not taken DMARDs or steroids in the preceding 3 months (18). The patients with transient arthritides consistent with crystal-induced conditions, reactive arthritis, septic arthritis, sarcoid arthritis, or undifferentiated arthritis who were in natural remission at 1 year of followup were classified as having self-limiting arthritis. Persistent arthritis was defined as the presence of arthritis in at least 1 joint and/or treatment with DMARDs or steroids within the previous 3 months at 2 years' followup. Erosive disease was defined as the presence of erosions (modified Sharp erosion score ≥1) on radiographs of the hands and/or feet.
The diagnostic variables recorded at the first visit were entered into a logistic continuation ratio model, an extension of the logistic regression model for ordinal outcomes (19, 20). It implies that the probability of persistent arthritis after 2 years and the probability of erosive arthritis given persistence after 2 years are both modeled with logistic regression. The odds ratios (ORs) for the 2 probabilities, persistent versus self-limiting arthritis and erosive versus nonerosive arthritis given persistence, are thereby assumed to be equal, provided that this is confirmed by statistical testing (19, 20). Three different models or criteria sets were developed.
Model 1 was based on all diagnostic variables, except the genetic typing which is not generally available in clinical practice. The model was developed in 2 steps. First, the diagnostic variables obtained from the patient history and physical examination were entered in the analysis to obtain the most efficient “patient history and physical examination model.” Then, laboratory and radiologic variables that independently contributed to an increase in its predictive value were added. This procedure is in accordance with the phased diagnostic evaluation commonly performed in clinical practice.
Model 2 was developed by adding the results of genetic typing to model 1. This was done in order to evaluate the additional effect of the genetic typing variables.
Model 3 was obtained by including only variables with no or minimal interobserver variability, such as age, sex, and laboratory and radiologic tests, in the analysis. In all analyses, a backward variable selection procedure was performed, with a significance level of 0.10 to remove the nonsignificant variables.
A simplified version of model 1 was constructed for clinical use by substituting the ORs with weighted scores. Score 1 was for ORs between 1.5 and 2, score 2 for ORs between 2 and 4, and score 3 for ORs between 4 and 6.
To evaluate the diagnostic performance of the different models, receiver operating characteristic (ROC) curves were constructed for discrimination between self-limiting and persistent arthritis and for discrimination between erosive and nonerosive arthritis given persistence. ROC curves plot the relationship between sensitivity, on the y-axis, and 1 − specificity, on the x-axis, for different cutoff levels of test positivity. The area under the ROC curve (ROC AUC) provides a measure of the overall discriminative ability of a model. The ROC area and its standard error (SE) were estimated using the nonparametric approach (21). The ROC curves were constructed by applying the models to each individual patient. In doing this, the models can be considered to be overall diagnostic tests, and the estimated probabilities for each patient are then the test results. The discriminative abilities of the different diagnostic models were compared with each other and with a diagnostic model formed by the list version of the ACR 1987 classification criteria (22). In the diagnostic model formed by the ACR 1987 classification criteria, all individual criteria had the same weight. The presence of symptom duration of at least 6 weeks at the first visit was included in the model as the eighth criterion.
Data analysis was performed with the standard software packages, SPSS (Chicago, IL) and SAS (Cary, NC).
Patient characteristics. Of the 566 patients included in the study, 19 (3.4%) died before 2 years' followup and 23 (4.1%) were lost to followup. The baseline characteristics of the 524 patients analyzed are shown in Table 1. They were classified according to international criteria (23) as follows: 156 (30%) with RA, 137 (26%) with undifferentiated arthritis, 58 (11%) with crystal-induced arthritis, 32 (6%) with osteoarthritis, 27 (5%) with sarcoidosis, 18 (4%) with spondylarthropathy, 16 (3%) with reactive arthritis, and the remaining group of 80 (15%) with other causes of inflammatory arthropathy. The 524 patients with early inflammatory arthritis had the following disease outcome at 2 years' followup: self-limiting arthritis in 313 patients (60%), persistent nonerosive arthritis in 84 (16%), and persistent erosive arthritis in 127 (24%). Table 2 shows the disease classification in the different outcome categories.
|Median age (range), years||49 (8–90)|
|No (%) female||277 (53)|
|Median symptom duration at first visit (range), months||2.7 (0–24)|
|No. (%) IgM rheumatoid factor positive*||114 (23)|
|Median no. of swollen joints (range)||2 (0–14)|
|No. (%) with erosions in hands and/or feet (range)||76 (15)|
|Self-limiting (n = 313)||Persistent nonerosive (n = 84)||Persistent erosive (n = 127)|
|Connective tissue disorders||0.3||6.0||0.8|
Diagnostic models.Model 1. Model 1 was developed by entering all diagnostic variables, except the genetic typing, into a logistic continuation ratio model. The model was developed in 2 steps, in accordance with the phased diagnostic evaluation of clinical practice. As shown in Table 3, the model consisted of 7 criteria. Each criterion had 2 ORs, one for its association with persistent arthritis and one for its association with erosive arthritis given persistence. The strongest association with persistent arthritis was found for the criteria symptom duration and anti-CCP positivity. The strongest association with erosive arthritis given persistence was found for the criterion anti-CCP positivity. Therefore, the symptom duration criterion predicts persistent arthritis, but given that the arthritis is persistent, does not predict erosive arthritis. The criterion bilateral compression pain in the MTP joints was more strongly associated with erosive arthritis given persistence than with persistent arthritis. When erosions were present at baseline, the probability of erosive disease at 2 years was infinite.
|Criterion||Persistent versus self-limiting arthritis||Erosive versus nonerosive arthritis given persistence|
|Odds ratio||Score||Odds ratio||Score|
|≥6 weeks but <6 months||2.49||2||0.96||0|
|Morning stiffness ≥1 hour||1.96||1||1.96||1|
|Arthritis in ≥3 joint groups||1.73||1||1.73||1|
|Bilateral compression pain in MTPs||1.65||1||3.78||2|
|IgM-RF ≥5 IU||2.99||2||2.99||2|
|Anti-CCP ≥92 IU||4.58||3||4.58||3|
|Erosions on hand or foot radiographs||2.75||2||Infinite||Infinite|
The model was simplified for clinical use by substituting the ORs with weighted scores, as described in Patients and Methods and shown in Table 3. The simplified model can be easily applied to an individual patient. Adding up the scores of the criteria present in a particular patient resulted in 2 total scores: one for the prediction of persistent arthritis and one for the prediction of erosive arthritis given persistence. The total score ranged from 0 to 13 for the prediction of persistent arthritis, and from 0 to 9 for the prediction of erosive arthritis. The total scores corresponded to predictive values, as shown in Table 4.
|Persistent arthritis versus self-limiting arthritis||Erosive arthritis versus nonerosive arthritis given persistence|
|Total score||Probability of persistence||Total score||Probability of erosions given persistence|
Exclusion of the recently developed anti-CCP test from the logistic analysis resulted in the variable “arthritis in the hand joints” taking its place in the model. The OR of this variable, however, was low and its simplified score was 0. Therefore, except for the anti-CCP positivity criterion, the scores in the simplified model 1 developed without the anti-CCP test were similar to the scores in the model developed with the anti-CCP test. The total score in the model developed without anti-CCP ranged from 0 to 10 for the prediction of persistent arthritis, and from 0 to 6 for the prediction of erosive arthritis.
Model 2. When, in the analysis, the results of genetic typing, i.e., the presence of the shared epitope and the presence of DQRA, were added to model 1, only the presence of DQRA homozygosity independently contributed to an increase in its predictive value. Thus, model 2 consisted of 8 criteria: the 7 criteria of model 1 and the DQRA homozygosity criterion. The OR of the criterion DQRA homozygosity was 2.49 (P = 0.01) both for the association with persistent arthritis and for the association with erosive arthritis given persistence.
Model 3. Model 3, obtained by including only variables with no or minimal interobserver variability into a logistic continuation ratio model, consisted of the criteria sex (OR 1.63), IgM-RF positivity (OR 2.91), anti-CCP (OR 5.0), erosions on radiographs of the hands and/or feet (OR 3.93), and DQRA homozygosity (OR 2.14). All criteria, except the criterion erosions, had similar ORs for the association with persistent arthritis and the association with erosive arthritis given persistence. When erosions were present at baseline, the probability of erosive disease at 2 years was infinite.
Discriminative ability. The ROC curves of the different diagnostic models for discrimination between self-limiting and persistent arthritis are shown in Figure 2. Model 1 had excellent discriminative ability, with a ROC AUC of 0.84 (SE 0.02). The discriminative ability of the simplified model 1 was equal to that of the original model 1. Model 2, consisting of the criteria of model 1 extended with the criterion DQRA, did not perform better than model 1 (ROC AUC 0.84 [SE 0.02]). The discriminative ability of both model 3 and the model formed by the ACR criteria was significantly lower (ROC AUC 0.78, SE 0.02) than that of model 1.
In Figure 3, the curves of the same diagnostic models are shown but have been used to discriminate between erosive and nonerosive arthritis, given that the arthritis is persistent. Overall, the results were similar to those shown in Figure 2. The 2 versions of model 1 and model 2 had excellent and comparable discriminative values, with ROC AUC values of 0.91 (SE 0.02) and 0.92 (SE 0.02), respectively. Model 3 and especially the model formed by the ACR criteria had lower discriminative values, with ROC AUC values of 0.86 (SE 0.03) and 0.79 (SE 0.03), respectively.
The overall discriminative ability of model 1 without the anti-CCP test was significantly lower than that of the model with the anti-CCP test: for persistent versus self-limiting arthritis, the ROC AUC was 0.82 (SE 0.02), and for erosive versus nonerosive arthritis, the ROC AUC was 0.90 (SE 0.02).
We have developed a diagnostic criteria set for early arthritis that is characterized by an excellent ability to discriminate, at the first visit, between self-limiting, persistent nonerosive, and persistent erosive arthritis. The set consists of 7 criteria: symptom duration at first visit, morning stiffness of at least 1 hour, arthritis in ≥3 joints, bilateral compression pain in the MTP joints, IgM-RF positivity, anti-CCP positivity, and erosions on radiographs of the hands or feet. The discriminative ability of this criteria set is higher than that of the ACR 1987 classification criteria for RA. The set can easily be introduced in everyday clinical practice. When it is applied to an individual patient, 3 clinically relevant predictive values are obtained: one for self-limiting arthritis, one for persistent nonerosive arthritis, and one for persistent erosive arthritis (Figure 4). These predictive values form a valuable basis from which therapeutic decisions can be made in an early phase of the arthritis. The early recognition of persistent (erosive) arthritis allows early intervention with DMARDs, which will lead to earlier disease control and improvement of disease outcome (1–6). Otherwise, early recognition of self-limiting arthritis will prevent the unnecessary treatment of these cases with potentially toxic DMARDs.
When the results of genetic typing, i.e., the presence of the shared epitope and the presence of DQRA, were added to the criteria set in the analysis, only the presence of DQRA homozygosity independently contributed to an increase in its predictive value. However, adding the presence of DQRA homozygosity as the eighth criterion to the diagnostic criteria set did not significantly improve the overall discriminative value of the criteria set. The discriminative ability of a diagnostic criteria set developed from only the objective variables was lower than that of the set developed from all diagnostic variables. These results indicate that the information from the patient history and physical examination is indispensable, and for the time being, the use of genetic typing is redundant in the diagnostic evaluation of early inflammatory arthritis.
At the moment, diagnostic criteria for RA do not exist. The growing evidence that early therapeutic intervention improves disease outcome (1–6) and the rapid development of powerful and expensive therapeutic agents make these criteria urgently needed (24). In clinical practice, the ACR 1987 classification criteria have often been used as a diagnostic tool for RA. However, these criteria were developed in a population of patients selected on the basis of their disease status as a means of classifying their RA, not as a way to diagnose RA (8). This probably explains the poor diagnostic performance of these criteria in early arthritis.
The diagnostic performance of the ACR criteria in consecutive, unselected, newly referred early arthritis patients was assessed previously in 3 studies (25–27). The studies used different gold standards for RA. The diagnostic ability of the ACR criteria was found to be reasonable in the 2 studies in which the clinical diagnosis was used as the gold standard (25, 27), whereas it was low in the study in which arthritis outcome was used as the gold standard (26). This discrepancy is explained by the inevitable occurrence of circularity, resulting in overestimation of the diagnostic properties of the criteria when the clinical diagnosis is used as the gold standard (28). Defining the gold standard in terms of arthritis outcome prevents the occurrence of circularity (10). Moreover, the outcome categories used in this study represent clearly defined, objective, biologic diagnoses: arthritis and erosions. This will minimize the variation in classification, and therefore, the model is more likely to be robust when applied in other populations.
The search for predictive factors of arthritis outcome has been the subject of many studies. Most of them were performed in selected patients with established RA (24, 29). Relatively few studies were performed in unselected patients with early inflammatory arthritis (18, 26, 30–37). Yet, it is only this type of study that may contribute to the development of diagnostic criteria. The studies that were performed in unselected early arthritis patients differed in the nature of the predictors and outcome variables assessed, the heterogeneity of the study population, and the duration of followup. None of the studies resulted in a set of predictive variables having a discriminative value sufficiently high to be useful as diagnostic criteria in clinical practice.
It may be confusing that the presence of erosions at the first visit is used to predict the presence of erosions at 2 years' followup. It is a consequence of our choice to use a logistic continuation ratio model to develop the prediction models. Using this type of logistic regression modeling, 1 set of criteria is obtained to discriminate both between persistent and self-limiting arthritis and between erosive and nonerosive arthritis given persistence. It is obvious that 1 set of criteria is much more usable in clinical practice than 2 different sets. The presence of erosions at the first visit is an important predictor of persistent arthritis at 2 years and therefore has been incorporated in the model. When erosions are present at the first visit, the probability of this having to do with persistent erosive arthritis at 2 years' followup is equal to the probability of persistent arthritis, according to our model.
A drawback of the present study is that some of the patients had been treated with DMARDs. Treatment with DMARDs is, of course, inevitable in prospective arthritis studies. It may, however, have influenced the outcome category of some of the patients. Moreover, the decision to start DMARDs is influenced by diagnostic variables obtained at the first visit. This implies that the gold standard, i.e., arthritis outcome, may have been influenced indirectly by the diagnostic variables that were evaluated, which leads to circularity. Alteration of arthritis outcome caused by the influence of DMARDs on the disease course occurs when persistent arthritis is classified as self-limiting arthritis or persistent erosive arthritis is classified as persistent nonerosive arthritis.
Yet, we do not believe that the DMARD treatment importantly influenced the results of this study, for several reasons. First, the use of DMARDs was incorporated in the definition of natural remission. This means that patients who had no signs of arthritis at followup but who were taking DMARDs were classified as having persistent arthritis. Second, we assume that the treatment with DMARDs did not importantly influence the presence or absence of erosions at 2 years' followup. This is because the patients were treated according to the “pyramid strategy,” which resulted in a long lag time of 4 months between the first visit and the start of DMARD therapy. Moreover, of the patients classified as having persistent nonerosive arthritis at 2 years' followup, half had not been treated with a DMARD in the first 2 years and a quarter had been treated only with chloroquine, which is not effective in preventing erosions. The others predominantly received salazopyrine monotherapy.
The principles applied in the present study with regard to the selection of the study population, the definition of the gold standard, and the use of multivariate techniques narrow the gap that often exists between the results of diagnostic studies and the diagnostic process in everyday clinical practice. The patients were selected based on the presence of a commonly encountered problem in clinical practice that poses a diagnostic, prognostic, and therapeutic challenge to rheumatologists, i.e., arthritis of recent onset. The gold standard was defined in terms of arthritis outcome, which also contributes to the clinical relevance and prevents the occurrence of circularity. Multivariate regression modeling was used, which allows the evaluation of the added diagnostic value of the variables studied, thereby adjusting for mutual dependencies with other variables and taking into consideration the phased diagnostic evaluation of clinical practice. Finally, probabilities of outcome were obtained, which are of far more value for practicing clinicians than are sensitivities and specificities. In our opinion, these principles of diagnostic research should also be applied in future diagnostic studies to improve their clinical relevance.
Implementation of a diagnostic model into practice requires adequate validation. A diagnostic model can discriminate well between the presence and absence of disease in the study population but may be less reliable elsewhere (38, 39). Therefore, the model developed in the present study will be validated in other early arthritis populations. In order to facilitate timely general acceptance, collaboration between researchers investigating early arthritis is of paramount importance. An internationally accepted diagnostic model would allow cost-effectiveness studies to find out the levels of probability of persistent (erosive) arthritis above which treatment with the various DMARDs should be started (40, 41).
In conclusion, a diagnostic criteria set for RA was developed, characterized by an excellent ability to discriminate, at the first visit, between self-limiting, persistent nonerosive, and persistent erosive arthritis. The set is easy to use and generates clinically relevant predictive values.