- Top of page
- PATIENTS AND METHODS
Rheumatoid arthritis (RA) is known to have a variable course resulting in a wide range of outcomes, varying from a mild disease causing hardly any impairment to severe disease leading to extensive disability and untimely death. Outcome in RA can best be described using a multidimensional approach (1, 2). A unified multidimensional classification for outcome of disease was recently proposed: the International Classification of Functioning and Disability (3). This classification recognizes 3 dimensions of outcome: body functions and structure, activities at the individual level, and participation in society. In addition to prediction of outcome at one particular time point, it is also relevant to be informed about the course of a disease such as RA. An estimation of this “burden of disease” can also be seen as a form of outcome.
In RA, the dimension of outcome “body function and structure” is best characterized by joint damage as measured by radiography. Baseline parameters found to be associated with radiographic damage at followup are disease activity parameters, HLA type, rheumatoid factor (RF), and radiographic damage at baseline (4–11). Predicting radiographic outcome and correctly classifying patients was done in 6 short-term outcome studies, showing correct classification to be possible in 70–83% of patients using varying combinations of parameters (11, 12). Correctly classifying patients has not been done yet in prospective cohort studies with a followup of more than 10 years.
The dimension of outcome “activities at the individual level,” formerly described as disability, is most often characterized by the Health Assessment Questionnaire (HAQ). Baseline parameters associated with severe disability at followup are disease activity, RF, erosions, disease duration, HAQ score, sex, and socioeconomic status (9, 13, 14). In short-term prospective followup studies, disability as measured by the HAQ could be predicted in up to 80% of the patients. In long-term prospective followup studies of patients with recent RA, the HAQ has not been used extensively. Two long-term prospective followup studies have investigated the associations of baseline and process parameters with severe functional outcome as measured using the Steinbrocker classification (15, 16). Parameters of the early stages of the disease that were found to be associated with long-term disability were radiographic damage, RF, and functional class after 3 years. Severe outcome after 15 years could be correctly predicted in 67% of the patients using a combination of baseline and process variables (16).
The dimension of outcome “participation in society,” formerly described as handicap, cannot be easily characterized and no standardized instruments are available. Substitutes are proposed, such as work and housekeeping disability. These were found to be associated with socioeconomic variables, HAQ scores, and radiographic damage at baseline (17–20).
In most studies describing the prediction of outcome, the patients are divided into 2 groups: those with a good or bad outcome. The caesura between the 2 groups is arbitrary and patients just under or above the cutting score may not differ in a clinically relevant way. It would be more useful to be able to correctly predict whether the patient will end up at one of the extremes of the spectrum of outcome.
The present study addresses the following questions: 1) What combination of baseline variables best predicts mild and severe outcome in a prospective inception cohort of RA patients after 12 years? 2) Can an individual predictive model be constructed identifying patients at the extremes of the outcome spectrum of RA?
PATIENTS AND METHODS
- Top of page
- PATIENTS AND METHODS
The present patient cohort has been described extensively in previous reports (8, 21, 22). In short, all consecutive female RA patients visiting the outpatient rheumatology clinic of the Leiden University Medical Center between 1982 and 1986 with symptoms of < 5 years' duration (median 1 year) and aged between 20 and 50 years at first visit entered the study. Of the 138 patients who were invited to participate, 3 refused and 3 were lost to followup in the first 6 years due to moving out of the area. One hundred thirty-two women were prospectively evaluated for an average of 12 years. Complete followup data were obtained from 112 (85%) of the 132 women. Three patients died: 1 of kidney failure, 1 of pancreas carcinoma, and 1 of a lung carcinoma. Three patients refused to participate, and 14 were lost to followup.
At study start, the examination consisted of a detailed interview, physical examination, administration of the HAQ, radiographs of hands and feet, and collection of blood.
The joints were examined using the Ritchie score (23) and a swollen joint count (SJC) was recorded of shoulders, elbows, wrists, knees, ankles, and small joints of hands and feet. The 5 metacarpophalangeal joints, 5 proximal interphalangeal joints, and metatarsophalangeal joints, left and right each, were calculated as 1 joint. The IgM RF titer was tested by an enzyme-linked immunosorbent assay. The IgM RF was considered positive at a level of >3 U. The percentage incidence of N-linked oligosaccharides that contained no terminal galactose residues (%G) was determined by immunoassay, as described elsewhere (24). A %G(0) value higher than 2 standard deviations above the mean value for healthy donors of the same age was considered elevated.
All patients were typed for HLA–DQ and HLA–DR. The HLA–DQ, DR haplotypes and consequent phenotypes were analyzed. The patients were sorted according either to the rheumatoid arthritis protective (RAP) hypothesis or the shared epitope (SE) model (25, 26). According to the RAP hypothesis, several DQ subtypes are related to the susceptibility and course of RA, whereas HLA-DRB1 alleles encoding the DERAA motif provide protection against severe RA (25). In this model, individuals with RA-associated DQ3 and DQ5 haplotypes without protective DERAA-positive DRB1 alleles are referred to as nonprotected (RAP−), whereas individuals with RA-associated DQ3 and DQ5 haplotypes and protective DERAA-positive DRB1 alleles and individuals with other DQ haplotypes are referred to as protected (RAP+). The SE model recognizes the presence or absence of certain specific DRB1 alleles. The presence of the SE is associated with susceptibility to RA and with a more severe disease course (27).
The baseline parameters were subdivided into the following categories: disease activity, laboratory parameters, damage, disability, and socioeconomical status.
At 12 years of followup, the examination consisted again of a detailed interview, physical examination, and a HAQ questionnaire. The number of months in which disease-modifying antirheumatic drug (DMARD) therapy was given was noted. The joints were examined using the Ritchie score and SJC. Blood samples were obtained for erythrocyte sedimentation rate (ESR). Radiographs were taken of the hands, feet, shoulders, elbows, wrists, hips, knees, and ankles.
The outcome variables were categorized into 4 dimensions: body functions and structure (impairment), activities at the individual level (disability), participation in society (handicap), and disease course.
Body functions and structure. The radiographs of hands and feet were obtained at study start and after 3, 6, and 12 years of followup. The radiographs of hands and feet were assessed according to the modified Sharp/van der Heijde method (SHS) (28). The radiographs at all time points were simultaneously scored by one observer within 5 months after the final followup assessment. Hand and foot radiographs from each visit were scored simultaneously as sets. Scoring of the sets was done in total random order for time and individuals. The radiographs of the large joints that were taken at 12 years of followup were assessed according to the Larsen large joint score (0–60) (29).
Activities at the individual level. The HAQ score was recorded at entry and at 3, 6, and 12 years of followup (1). The HAQ used has been adapted for the Dutch population (30); at the 12-year followup, the RAND-36 questionnaire was added to the assessment of functional ability. The RAND-36 scale is a generic instrument and has been modified for the Dutch population (31). It consists of 36 items divided into 8 scales and 4 dimensions: functional status, well being, general health perception, and health change. The dimension “functional status” was also used to describe disability.
Participation in society. The dimension “well being” of the RAND-36 was used to describe psychosocial factors. Limitations in housekeeping or in occupation (changes in career, working hours, type of work) due to RA were recorded. Working fewer hours, adjustments in career planning, or use of an aid at work were recorded as limitation in work situation. If 2 or more answers indicated a limitation, the patient was classified as being limited due to RA.
Disease course. A panel of 5 experienced rheumatologists jointly defined criteria and classified the patients into one of two groups: mild or severe RA course. The patients were identified according to cumulative disease activity over the years and radiographic damage. The disease activity score (DAS), a pooled index of the ESR, SJC, and the Ritchie articular index (32), was used to measure disease activity. The score was calculated at entry, 3, 6, and 12 years of followup. The cumulated disease activity was measured by calculating the area under the curve of all the DAS assessments during the 12 years of followup. The patients in the lowest tertile of radiographic damage and the lowest tertile of cumulated disease activity were considered to have a mild disease course. Patients in the highest tertile of radiographic disease course or the highest tertile of cumulated disease activity were classified as having a severe disease course.
Outcome.. The median, tertiles, and range were calculated for the outcome variables categorized into the 4 dimensions after 12 years of followup: body functions and structure, activities at the individual level, participation in society, and disease course.
Prediction. Using the outcome variables, 2 groups of patients were identified for each of the dimensions of outcome: the lowest tertile (mild outcome) and the highest tertile (severe outcome). The values of the baseline parameters for both groups were calculated and significance of the difference between both groups was tested using a Student's t-test, a Mann–Whitney U test, or a chi-square test where appropriate.
To calculate which baseline parameters predict outcome, several logistic regression models were constructed in which variable mild and severe outcome in the 4 dimensions of outcome were the dependent variables. The number of baseline and outcome variables was reduced to create comprehensive models. Only the baseline parameters that significantly differed between patients in the lowest and highest tertiles of outcome (in each dimension) were entered into the model. If outcome measures of one dimension of outcome were very alike, a model was constructed for only one of the outcome measures in that dimension. For radiographic joint damage, the SJC, Ritchie score, DAS, RF, SE, RAP, %G(0), HAQ, SHS, and erosive disease at study start were entered in the analysis. For the HAQ, the SJC, Ritchie, DAS, RF, RAP, HAQ, SHS, and erosive disease at study start were used. In the model for the course of the disease, the same parameters as for radiographic joint damage were used, except that erosive disease at study start was omitted. Using logistic regression, a model was developed to evaluate the contribution of each baseline parameter to the probability of mild or severe outcome in each of the dimensions. The differences between either the subgroup with mild outcome or with severe outcome compared with the other patients of the cohort were studied. First, all baseline parameters that were significantly associated with the outcome variable were entered into a stepwise forward logistic regression model. Second, the procedure was repeated using only easily obtainable and objective baseline parameters: the Ritchie index, SJC, ESR, RF, HAQ, the presence of erosions, and SHS.
The overall correctness (accuracy), the positive predictive value (PPV), and the negative predictive value (NPV) of each model were calculated. The PPV represents the percentage of patients who are correctly predicted to be in the mild outcome group or the severe outcome group. The NPV represents the percentage of patients who are correctly predicted not to be in the mild outcome group or the severe outcome group When it is important, for instance, to not falsely identify a patient as severe when in fact she has a mild course, a high PPV is warranted. On the other hand, when it is important to identify all severe patients and the inclusion of a mild patient is not problematic, then a high NPV is necessary. The accuracy, NPV, and PPV levels were estimated using the leave-one-out cross validation method (33, 34).
Using the odds ratios of the logistic regression models, the chance of mild or severe outcome in the several dimensions of outcome was calculated for each individual patient. Last, a simple decision tree was constructed to quickly categorize the patients and give an indication of the probability of severe radiographic damage. The decision tree was constructed by averaging the individual probabilities, and by selecting the value maximizing the differences between the strata.
- Top of page
- PATIENTS AND METHODS
There were no statistically significant differences in the baseline characteristics of the 112 assessed and 26 missing patients, except for the median ESR, which was 20 mm/hour and 35 mm/hour, respectively (P = 0.03 Mann–Whitney). The mean age at study start was 37 years (SD 8.4 years) in the assessed patients and 38 years (SD 9.6 years) in the missing patients. The median duration of symptoms at study start was 1 year (range 0–5 years) in both groups. At the start of the study, the IgM RF was positive in 71% of the assessed patients and in 67% of the missing patients; 25% and 32% of the patients, respectively, were erosive at entry. The median HAQ score was 0.75 (range 0–2.88) in the assessed patients and 0.79 (range 0–2.9) in the missing patients.
The medians, tertiles, and ranges of the outcome measures are presented in Table 1. Nearly half of the patients were limited in housework or occupation, and 45% of the patients could be considered to have a severe course of RA. In the group with the lowest tertile of radiographic damage, the median duration of DMARD therapy over 12 years was 10 months (range 0–144). In the group with the highest tertile of radiographic damage, the median duration of DMARD therapy was 83 months (range 7–191). Similar differences in duration of DMARD therapy were found in the subgroups of patients in the tertiles with respect to HAQ and in mild and severe disease course. The median values of the baseline parameters in patient subgroups according to mild and severe outcome defined by the lowest and highest tertiles of SHS, HAQ, and disease course and their associations with the baseline variables are shown in Table 2. The radiographic damage as measured with the SHS of hands and feet and the Larsen large joint score proved to correlate equally with the baseline parameters; therefore, only the SHS is shown and further used for the analyses. The same applies for the HAQ and the RAND 36 function scores; because the HAQ is more widely used in rheumatology, it was chosen for the analyses. The association of the measures describing “participation in society” or handicap (well-being and limitations in housekeeping or work due to RA) did not correlate significantly with any baseline parameter. In the subgroups mild and severe disease of these outcome measures, similar values for baseline variables were found (not shown). The 3 dimensions of outcome were not associated with baseline parameters for socioeconomic status.
Table 1. Outcome categorized for body function and structure, activities at individual level and participation in society*
| ||33rd percentile||Median||66th percentile||Range|
|Body functions and structure|| || || || |
| SHS of hands and feet||42||145||189||0–428|
| Larsen large joint score||1||3||10||0–55|
|Activities at the individual level|| || || || |
| HAQ score||0.37||0.87||1.25||0–3|
| RAND-36 functional status score||14||21||30||0–92|
|Participation in society|| || || || |
| RAND-36 well being score||75||70||62||15–100|
| Limited in occupation or housework|| ||46%|| || |
| Severe disease course|| ||45%|| || |
| Mild disease course|| ||16%|| || |
Table 2. Median values or percentage of baseline parameters in the patient subgroups according to mild and severe outcome*
| ||All patients||SHS lowest tertile (n = 37)||SHS highest tertile (n = 38)||HAQ lowest tertile (n = 38)||HAQ highest tertile (n = 37)||Mild course (n = 16)||Severe course (n = 45)|
|Age, year (median)||37||35||37||37||37||34||38|
|Disease duration, year (median)||1||1||1||1||1||0||1|
|Disease activity parameters|| || || || || || || |
| ESR (mm/hg) (median)||27||26||32||24||28||27||30|
| SJC (median)||3.5||1‡||6‡||2.1‡||6‡||0‡||6‡|
| Ritchie score (median)||9||3‡||14‡||3.6‡||15‡||1‡||15‡|
|Number of swollen large joints (median)||0||0||0||0||0||0||0|
| DAS (median)||2.8||2.6†||3.4†||2.4†||3.4†||2.8†||3.4†|
|Laboratory parameters|| || || || || || || |
| RF+ (%)||55||11‡||65‡||22‡||53‡||12‡||65‡|
| SE+ (%)||71||51†||82||56||67||25‡||81‡|
| RAP− (%)||68||54†||82†||50†||80||25‡||81‡|
| Elevated G(0) (%)||25||12†||57‡||21||37||29||48‡|
|Disability parameters|| || || || || || || |
| HAQ (median)||0.75||0.44‡||1.25‡||0.13‡||1.45‡||0‡||1.13‡|
| Damage parameters|| || || || || || || |
| SHS score (median)||12||0‡||25‡||0‡||30.5‡||0‡||20‡|
| Presence of erosions (%)||22||3‡||41†||15‡||40||0‡||34|
|Socioeconomic status|| || || || || || || |
| Educational level high (%)||14||19||7||20||10||37||7|
| Educational level medium (%)||26||27||32||17||31||50||34|
| Educational level low (%)||60||54||60||63||59||27||59|
| Married (%)||83||75||82||85||79||62||83|
| Paid occupation (%)||52||43||64||50||65||25||53|
Predictive models were constructed for mild and severe outcome with respect to SHS, HAQ, and severe disease course. First, all correlated baseline parameters of each outcome variable as found in the univariate analysis (Table 2) were entered stepwise into a logistic regression model with mild or severe outcome with respect to SHS, HAQ, or disease course as dependent variables.
In Table 3, the cross-validated logistic regression models of the mild and severe group in the 3 studied outcome measures are shown, with the exception of mild disease course. In that subgroup, the number of patients was too small to construct a dependable model.
Table 3. Results of a cross-validated stepwise logistic regression analyzing which baseline parameters are most predictive (odds ratio) for the outcome in the 3 dimensions studied*
| ||Mild radiographic damage||Severe radiographic damage||Mild HAQ||Severe HAQ||Severe disease course|
| ||RF||RF||RF||RF||ERO||ERO|| || ||HAQ||RS|
| ||(0.11)||(0.002)||(3.58)||(3.31)||(0.06)||(0.06)|| || ||(1.75)||(1.15)|
| ||ERO||ERO||ERO||ERO|| || || || ||SE|| |
| ||(0.02)||(0.006)||(5.60)||(4.33)|| || || || ||(0.1)|| |
| ||SE|| ||RAP|| || || || || || || |
| ||(0.16)|| ||(0.15)|| || || || || || || |
|All over correct %||88||87||85||84||88||88||84||84||80||83|
The models of each of the 3 outcome measures (radiographic damage, HAQ, and severe disease course) chose combinations of the similar baseline parameters: the SJC, Ritchie score, SE, RAP, RF, HAQ, and presence of erosions.
An all-over correct prediction of outcome for an individual patient proved possible in 80–88% for the mild and severe groups. The PPV ranged from 87% to 91% for radiographic damage and disability, indicating false positives to be scarce. The NPV of the disease course was somewhat lower (80%).
Second, a model was constructed for the 3 previously mentioned outcome measures using only easily obtainable and objective baseline parameters: the Ritchie index, SJC, ESR, RF, HAQ, and SHS. When only these variables were put into the model, the overall correctness of the prediction decreased only marginally. Addition of HLA typing to the model improved the correct prediction of radiographic damage by only 3%.
To translate the results of the model for clinical practice, an example of a decision tree for severe radiographic damage was constructed using the results of the logistic regression model and is shown in Figure 1.
Figure 1. The prediction of severe radiographic damage at 12-year followup in 112 patients defined as the patients with the highest tertile Sharp score modification van der Heijde (SHS) using easily obtainable and objective baseline parameters. RF = rheumatoid factor; SJC = swollen joint count; chance = probability of severe disability
Download figure to PowerPoint
- Top of page
- PATIENTS AND METHODS
The present study shows that outcome in RA with respect to radiographic damage, disability, and disease course can be predicted using only widely obtainable baseline parameters. Using the SJC, RF, HAQ, and radiographic damage, outcome can be predicted with an accuracy of 81–90%. Additional knowledge of HLA typing did not significantly improve the accuracy of prediction.
In the literature, several combinations of baseline parameters have been found to predict outcome. The fact that the predictive baseline parameters were found to differ among studies can be due partly to the fact that these studies used different baseline parameters in the construction of predictive models. In the present study, we tried to use all known predictive parameters, including the most recently available, HLA typing. The baseline parameters we found to be associated with the respective dimensions of outcome in the univariate analyses were similar to those found in earlier long-term followup studies. As see in earlier studies in the multivariate analysis, only a selection of the baseline parameters were needed to reach an optimal prediction (11, 12).
In the present study, univariate analysis showed HLA subtypes to be strongly associated with outcome. This is in accordance with several earlier studies where radiographic damage in RA was found to be associated with HLA subtype (7, 10). However in the multivariate analysis, the additive value of the HLA subtype proved to be significant only in the prediction model of radiographic damage at 12 years of followup, thus improving the accuracy of the prediction by 3%. This is in accordance with earlier studies where HLA typing was found to attribute to the prediction of radiographic damage, whereas it was also shown that there is no significant additive value of HLA type to the explanation of the HAQ (35).
We were able to predict neither limitations in housekeeping or work due to RA nor the measures of well being. This could be explained by the fact that these variables have been shown to be predominantly affected by such non–disease-related factors as demographic variables, family and personal circumstances, and work characteristics (17–20).
Most studies on prediction of outcome in RA use dichotomized outcome variables. This dichitomization is not practical from the clinician's point of view because the patients around the cutting score are very similar with respect to outcome. Of greater clinical relevance would be the identification of patients at the extremes of the spectrum of outcome, in order to target therapy more adequately. Therefore, in this study the patients were divided in 2 subgroups, mild and severe, leaving out the group of patients between these two extremes. One other study used a similar division into subgroups, the accuracy of prediction after 6 years of followup was lower than in the present study: 67–80% and 79–90%, respectively (13).
In prediction of mild or severe outcome, not only the accuracy (or percentage of overall correctness) of the prediction is important, but the probability of that outcome in a certain patient is even more important. This probability is best assessed by the positive and negative predictive values (PPV and NPV). Only a limited number of studies investigated the probability that an individual patient presenting with those particular clinical features will or will not develop the outcome in question (6, 7). These studies showed that in short-term followup, 2 groups of radiographic damage could be predicted with a PPV and an NPV of 83–89% and 51–68%, respectively. In the present long-term study, mild and severe radiographic damage could be predicted more sensitively and specifically; the PPV and NPV were 87–91% and 80–86%, respectively. As far as we know, the PPV and NPV in the prediction of disability as measured with the HAQ have not been studied. We found that the HAQ score can be predicted with the same accuracy as radiographic damage. Mild radiographic damage and mild disability could be predicted slightly more accurately than severe damage and severe disability.
To be of clinical use, prediction of outcome should preferably reach an accuracy of 90%. Depending on the issue studied, either a high NPV or a high PPV is most favorable. When the aim of the prediction is to identify patients who have a very severe outcome so experimental and risk-baring therapies may be used, a high PPV is needed. In this way, there would be only a small probability of treating a patient who would not have needed the therapy, and thus exposing this patient to unnecessary risk. In contrast, if the aim of the prediction is to be sure not to miss a patient using the prognostic variables, a high NPV is most important. Over the last decade, the treatment of RA underwent a large change. Early and aggressive treatment of RA is now advocated instead of the earlier “wait and see” policy. Irreversible damage is shown to occur early in the disease course. Because new and effective drugs with relatively few side effects have been added to the armamentarium of the rheumatologist over the last decade, it is important that all patients with any chance of radiographic damage or disability receive early effective treatment. The individual prediction model found in the present study has a high PPV and therefore is best used in studies needing a high specificity in identifying patients.
This study shows that prediction of outcome in long-term RA is possible and can be done using only widely available baseline parameters. Within groups, prediction is possible with a clinically relevant high accuracy; in the individual patient, the prediction showed to be very specific in identifying patients with respect to radiographic damage, disability, and disease course.