To describe the development and validation of a disease activity index in early arthritis that can be easily applied in daily practice and clinical research.
To describe the development and validation of a disease activity index in early arthritis that can be easily applied in daily practice and clinical research.
The Hospital Universitario La Princesa Index (HUPI) was developed after analysis of data from an early arthritis cohort (202 patients with 756 visits). It is the sum of 4 variables (graded 0–3): tender joint count, swollen joint count, patient global assessment, and acute-phase reactants (erythrocyte sedimentation rate [ESR] and/or C-reactive protein [CRP] level, depending on availability at the moment of evaluation). The score for each variable was based on its quartile distribution in the cohort. The HUPI was validated using the following properties: feasibility, internal consistency (Cronbach's alpha), convergent validity (Pearson's r coefficients with other activity measures), criterion validity (area under the receiver operating characteristic curve [AUC ROC] to detect minimal disease activity [MDA]), and sensitivity to change (AUC ROC) to detect change with the physician's and patient's assessment of disease activity.
Internal consistency is reasonable (α = 0.63). The HUPI correlates well with activity measures such as the Disease Activity Score in 28 joints (DAS28; r = 0.89) and the Simplified Disease Activity Index (SDAI; r = 0.70), and correlates slightly worse with the functional index of the Health Assessment Questionnaire (r = 0.69). It discriminates MDA correctly (AUC 0.95), and its sensitivity to change is slightly superior (AUC 0.902) to that of the DAS28-ESR (AUC 0.864), the DAS28-CRP (AUC 0.889), and the SDAI (AUC 0.791).
The HUPI has face validity, is easy to calculate, is sensitive, and is a valid composite index for the assessment of disease activity in patients with early arthritis, both in clinical research and in routine care.
Intensive management strategies based on enhanced assessment and tight control of disease activity can improve the outcome of patients with rheumatoid arthritis (RA) (1, 2). Furthermore, the importance of early diagnosis is increasingly important, as recent-onset disease is more sensitive to treatment than later-stage disease. Consequently, early initiation of treatment can slow or prevent disease progression.
Early arthritis clinics have been established in many countries and enable faster referral of patients with arthritis and earlier implementation of strategies to improve clinical and radiologic outcomes (3–6). The use of composite indices in these clinics has proven extremely useful in the followup and assessment of patients with early arthritis. Appropriately validated instruments are necessary in the daily practice. A task force, comprised of rheumatologists and a patient, has recently developed a set of recommendations to use a treatment-to-target strategy in patients with RA. One of the recommendations was to apply this strategy to routine care (7). This specific recommendation found a high level of agreement between a large number of physicians representing 34 different countries (8). To be able to implement this treatment-to-target strategy, physicians would need to have a feasible and valid measure of disease activity to set a specific target.
The Disease Activity Score (DAS) was developed based on expert opinion as a measure of RA activity (9). Modifications of this score, i.e., the DAS in 28 joints (DAS28; simplified joint count) (10) and the DAS28 based on C-reactive protein (CRP) level instead of erythrocyte sedimentation rate (ESR) (11), were developed to provide a more accurate and feasible measure of disease activity. Although these versions have advantages over the DAS, they remain subject to limitations. Application of the DAS and its versions is complex, as it is necessary to use a DAS calculator. In addition, ESR thresholds differ for men and women, and women have higher tender joint counts (associated with a higher density of C-fiber nociceptors); therefore, the DAS28 may overestimate disease activity in women (12–15).
Considering CRP level as a more reliable acute-phase reactant (APR) than ESR (16), the DAS28-CRP could reduce sex bias. However, we previously showed that DAS28 results vary according to whether ESR or CRP level is used in the calculation (17). This finding could affect the results of multicenter observational studies in which using one marker or the other depends on local availability.
In the 2003 study by Smolen et al, the Simplified Disease Activity Index (SDAI), an unweighted and untransformed index (18), was developed; it is calculated as the sum of 5 variables, namely those included in the DAS28-CRP plus the physician's global disease assessment (GDA-Ph). These same authors also developed the Clinical Disease Activity Index (CDAI), a composite index that does not incorporate an acute-phase response (19). The SDAI and CDAI results correlate closely with those of the DAS28, although they do not follow a normal distribution (considerable left shift compared with the DAS28). The absence of a normal distribution reduces the usefulness and interpretation of results in some statistical models when the SDAI score is the dependent variable. In addition, there is considerable disagreement in the classification into low and moderate disease activity between the DAS28 and the SDAI (20).
Consequently, we aimed to develop an index that needed no special calculator, had a normal distribution, accounted for sex differences, and resolved the issue of choosing between CRP level or ESR in multicenter observational studies and clinical practice. Therefore, considering that early initiation of disease-modifying antirheumatic drugs adjusted to a tight control strategy is critical for the achievement of remission, we created an index in a mixed population of patients with early RA and patients with undifferentiated arthritis (UA); both of these patient groups are regularly seen at early arthritis clinics.
Because no single marker reflects all the characteristics of the disease, evaluation of disease activity in early rheumatoid arthritis (RA) and undifferentiated inflammatory arthritis is a challenge for rheumatologists.
Although disease activity indices have significantly improved assessment of RA, they remain limited by their complexity and sex bias.
We describe the validation of a new composite disease activity index based on the sum of 4 variables graded from 0 to 3: tender joint count, swollen joint count, patient global disease assessment, and erythrocyte sedimentation rate/C-reactive protein level. The index is feasible and sensitive to change, and could prove superior to previous indices in that it prevents bias arising from sex and missing data.
This new index can be used in patients with early arthritis and will be validated in patients with late arthritis.
We developed the Hospital Universitario de la Princesa Index (HUPI) using data from the Early Arthritis Registry of Hospital Universitario La Princesa (Madrid, Spain), which exhibits similar characteristics as compared with the Leiden Early Arthritis Clinic (Supplementary Table 1, available in the online version of this article at http://onlinelibrary.wiley.com/doi/10.1002/acr.21854/abstract), providing support for the generalizability of this population. Validation involved the use of followup data to analyze feasibility, validity, and sensitivity to change.
The Early Arthritis Registry of Hospital Universitario La Princesa includes all patients with ≥2 swollen joints for >4 weeks and <1 year and a diagnosis of RA (1987 revised criteria of the American College of Rheumatology [ACR]) (21) or undifferentiated inflammatory arthritis. The clinical protocol of the registry comprises 4 structured visits during a 2-year followup period and was reviewed and approved by the Ethics Committee for Clinical Research of Hospital Universitario La Princesa. Prior to inclusion in the register, all patients signed a written informed consent form.
Treatment was individualized and adapted as necessary based upon the decision of the treating rheumatologist. Routine clinical and laboratory data were collected as follows: rheumatoid factor, anti–cyclic citrullinated peptide antibody, ESR, CRP level, duration of symptoms, swollen joint count (SJC) and tender joint count (TJC) out of a total of 28 joints, the GDA-Ph and the patient's global disease assessment (GDA-P) on a 100-mm visual analog scale, and the Spanish version of the Health Assessment Questionnaire (HAQ) (22). The DAS28 with ESR or CRP level, the SDAI, and the CDAI are calculated automatically. Joint count assessments were performed by 2 experienced physicians (AMO and IG-A) to reduce interrater variability.
Following the recommendations of Outcome Measures in Rheumatology for core set outcome measures (23, 24) and the indices DAS (9), DAS28 (25), and SDAI (18), we based our index on the TJC, SJC, GDA-P, and the APR.
Each variable was divided into quartiles, each of which was assigned an ordinal value from 0 to 3 (see Table 1). Additionally, and based on evidence suggesting sex bias for the TJC and ESR (13, 14), we defined different cutoff points for these variables stratified by sex (Table 1). As elderly patients exhibit higher GDA-P, we defined 2 strategies to assign the cutoffs for this variable. The first strategy was to divide the population into 2 groups depending on age (≤40 years or >40 years); the resulting variable was called global disease assessment by patient by age (GDA-P1). The second strategy was not to divide the population, and the resulting variable was called global disease assessment by patient irrespective of age (GDA-P2).
|Swollen joint count||0||1–2||3–4||>4|
|Tender joint count|
|Ages ≤40 years||0–10||11–20||21–40||>40|
|Ages >40 years||0–20||21–40||41–50||>50|
|CRP1 level, mg/dl†||≤0.30||0.31–0.50||0.51–1||>1|
|CRP2 level, mg/dl||≤0.10||0.11–0.80||0.81–1.50||>1.50|
The CRP level was scored using 2 strategies: one according to quartile distribution (CRP1) and the other according to theoretical thresholds based on local reference ranges (CRP2).
In addition to the development of the HUPI versions that included only ESR, CRP1 level, or CRP2 level, we described 4 different possibilities to input the APR. Versions with APR1 and APR2 only were calculated if the CRP level and ESR were both available. In this case, the APR1 approach was to use the average of the scores of the ESR and CRP1 level, and the APR2 approach was to use the average of the scores of the ESR and CRP2 level. Versions including APR3 and APR4 were calculated with the scores of ESR or CRP level (CRP1 level with APR3, CRP2 level with APR4) when only one of them was available or with the average of their scores when both were available.
We developed 10 alternative HUPI versions, each of which was the sum of the scores assigned to each of these 4 stratified variables (Supplementary Table 2, available in the online version of this article at http://onlinelibrary.wiley.com/doi/10.1002/acr.21854/abstract). Each variable was discrete, and the total ranged from 0 to 12. Most variables had increments of 1 point between levels, although some had increments of 0.5 point (i.e., those including APR1–4; see Supplementary Table 2, available in the online version of this article at http://onlinelibrary.wiley.com/doi/10.1002/acr.21854/abstract).
To determine which of these versions was the most reliable and useful, we analyzed different aspects of the validation process, namely feasibility, reliability, construct validity, and responsiveness.
Feasibility includes domains such as completion time, difficulty, clarity, and acceptance by both patients and clinicians. We quantified feasibility by creating an ad hoc measure ranging from 0 (unfeasible) to 3 (completely feasible) to evaluate 3 domains: completion time (according to the number of variables included); clarity of the calculation (depending on the simplicity of the variables); and acceptance (low probability of missing data). Each author independently rated each index in the 3 domains. The final rating of the versions was the mean of the 3 values.
Reliability embraces the concepts of internal consistency and reproducibility. The internal consistency or “good construction” of each HUPI version was tested using Cronbach's alpha (where α = <0.70 indicates that individual items provide an inadequate contribution to the overall scale, and values of α = >0.90 suggest redundancy).
Construct validity refers to the proximity of our measure to similar measures (convergent validity) and distance from dissimilar measures (divergent validity). When comparing the HUPI with similar construct measures (disease activity measures), a high correlation (Pearson's r) would be expected; when comparing it with less closely related constructs, such as function, a lower correlation would be expected. We tested the HUPI against the DAS28 and the SDAI, and then against the HAQ.
Criterion validity was evaluated using receiver operating characteristic (ROC) curves with minimal disease activity (MDA) (26) as the external criterion. MDA was developed in 2005 by Wells et al (26) as a satisfactory state of disease activity to compare different treatment strategies, bearing in mind that true remission is difficult to achieve in patients with RA. Two equivalent definitions were formulated, one based on the DAS28 (European League Against Rheumatism [EULAR] response criteria) and the other based on meeting cutoffs in 5 of the 7 World Health Organization/International League of Associations for Rheumatology core set outcome measures, which is the set used in our analysis. The statistic applied was the area under the curve (AUC) (27), and the ROC curve of the HUPI was compared with that of the DAS28-ESR, the DAS28-CRP, and the SDAI using the roccomp command of Stata.
Responsiveness, also called sensitivity to change, is defined as “the ability of an instrument to accurately detect change when it has occurred” (28), implying that the intervention administered to the study patients involved an effect with a known direction. In our cohort, the intervention was the treatment initiated by the physician, which in most instances was methotrexate. We analyzed the AUC of the change in the HUPI for identifying patients who improved after 6 months of treatment (29). Responsiveness was tested against 3 definitions of improvement as follows: 1) a change in the GDA-Ph >10 between baseline and 6 months of followup; 2) the same definition but for GDA-P; and 3) change in the DAS28 compared with the change in the HUPI. Responsiveness by the first 2 definitions was tested with the AUC of the ROC curves using the roccomp command of Stata to determine statistically significant differences between the different indices. Responsiveness by the third definition was tested with the beta coefficient from the linear regression analysis.
Statistical significance was set at a P value of less than 0.05; if Bonferroni correction because of multiple comparisons was needed, then the P value was set at less than 0.0125.
We analyzed 756 visits corresponding to 202 patients (2–4 visits/patient, mean 3.6 visits/patient), of whom 77% were women. Mean ± SD age at onset was 53 ± 16 years. At the end of followup, 70% fulfilled ACR 1987 revised criteria (21) for RA, and 30% were classed as having UA. A more detailed description of this population has been published previously (17).
None of the 10 versions of the HUPI exhibited a perfect Gaussian distribution, although the values obtained in the Shapiro-Wilk test were similar to those obtained for the 2 versions of the DAS28 and were slightly higher than those obtained for the SDAI (Figure 1 and Table 2). In addition, all 10 versions exhibited comparable validity (Table 2), although HUPI version 10 fared better in most of the validity aspects and was thus selected. All further references to the HUPI concern this version of the index.
|Feasibility, T+E+A||Construct validity||Criterion validity, MDA||Normality, W score†||Responsiveness|
The mean ± SD HUPI score was 6.51 ± 3.18 at baseline and 4.35 ± 2.59 at 6 months. The HUPI was calculated at 722 of 756 visits because of missing data, whereas the DAS28-ESR was only calculated at 684 visits and the DAS28-CRP and SDAI at 664 visits. As expected, the HUPI score correlated with the variables that measure disease activity (ρ = 0.89 and n = 684 for DAS28-ESR; ρ = 0.91 and n = 664 for DAS28-CRP; ρ = 0.71 and n = 664 for SDAI; and ρ = 0.82 and n = 664 for CDAI) and the correlation between the HUPI and the HAQ was also high (ρ = 0.69). The same was true for the other disease activity indices (DAIs; data not shown).
Although internal consistency was modest for the HUPI, with a Cronbach's alpha of 0.63, it was still better than for the DAS28-ESR (α = 0.52), the DAS28-CRP (α = 0.47), the SDAI (α = 0.48), and the CDAI (α = 0.46). We did not specifically test reproducibility, since the reproducibility of joint counts and global disease assessment is not very good (30).
To compare how the 5 DAIs discriminate MDA, we used only the 664 visits in which all 5 indices were estimated. The AUC for the HUPI was very high (0.956), slightly larger than that of the DAS28-ESR (0.930; P = 0.001) or the DAS28-CRP (0.945; P = 0.077), and similar to that of the SDAI (0.957; P = 0.971) (Figure 2) and the CDAI (0.964; P = 0.343) (data not shown).
Using the physician assessment and the patient assessment as external criteria of change, we compared the responsiveness of the HUPI after 6 months of treatment with that of the DAS28-ESR, the DAS28-CRP, the SDAI, and the CDAI based on data from the 94 patients for whom the information of all DAIs was available at baseline and at the second visit. With GDA-Ph, the AUC for the HUPI (0.902) was slightly larger than that of the DAS28-ESR (0.864; P = 0.229), DAS28-CRP (0.889; P = 0.625), SDAI (0.792; P = 0.01), and CDAI (0.791; P = 0.002). With GDA-P, the AUC for HUPI (0.841) was similar to that of the DAS28-ESR (0.814; P = 0.218), DAS28-CRP (0.833; P = 0.739), and SDAI (0.786; P = 0.208) and was slightly lower than the one for CDAI (0.987; P < 0.001).
Responsiveness was also tested using linear regression analysis. The difference in the DAS28-ESR from baseline to 6 months was the dependent variable. The β coefficient for the difference in HUPI was 0.85. 3
Measurement of disease activity is considered a standard approach in clinical practice that facilitates management and followup of patients. Reliable evaluation has been possible using the RA core set variables (31, 32) and composite indices described in the literature, the most well-known being the DAS28 (10), the SDAI (18), and the CDAI (33). These indices are difficult to apply because of the complexity of calculation, sex and age bias (DAS28) (12–14, 34, 35), and a nonweighted design shifting the distribution to the left (SDAI and CDAI) (Figure 1; data not shown for CDAI). In order to overcome these drawbacks, we developed and validated a new index, the HUPI, which includes the same variables used in the DAS28 and the SDAI. The HUPI is simple to calculate and seems at least as accurate and sensitive to change as previously validated indices.
The HUPI was developed by analyzing the balance between simplicity, reliability, accuracy, and sensitivity. In the resulting versions, the variables were weighted according to their quartile distribution in the study population. The 10 versions of the index varied with the weighting applied to the different variables (e.g., sex, age, APR cutoffs, and application of CRP level or ESR). The best index, the HUPI10, used a common score for SJC and GDA-P and a sex-adjusted score for TJC and ESR. Following a common approach in large multipractice registries, we applied ESR, CRP level, or both, depending on availability.
Since the HUPI can be completed within a few minutes, it is more suited to daily clinical practice than the DAS28. It is not as easy to calculate as the SDAI, since the calculation requires a table with the different cutoffs, although given the left shift in the SDAI distribution, the sensitivity to change of the HUPI is significantly better than that of the SDAI and slightly higher than both DAS28-ESR and DAS-CRP.
The key advantage of the HUPI is the possibility of using ESR, CRP level, or both as the APR. Missing data on ESR or CRP level is a frequent problem in observational studies. DAS28 values estimated with ESR or CRP level are not equivalent (17, 36), as occurs with the SDAI and CDAI (19). Furthermore, CRP level is more effective and informative than ESR in some patients and vice versa. Therefore, the possibility of using both APRs, and the fact that HUPI values cover the complete range of the index (whereas values of the other DAIs only span part of their ranges [Figure 1]), may account for its sensitivity to change. Consequently, application of the HUPI in clinical trials might help to reduce the number of patients needed to test differences between comparators.
In contrast to our work, most studies on validation have been performed in patients with established RA. Our cohort includes patients with early RA and UA. This approach is consistent with the current trend of early management and diagnosis of RA that led to the development of the new ACR/EULAR 2010 classification criteria (37). To our knowledge, only one other publication has tried to address the validation of DAS in UA (38). Our group is performing an additional validation in longstanding disease.
Our study is limited by the fact that validation of an instrument is an ongoing process. In addition, we tested validity in a single population using a single data set; therefore, our results cannot be extrapolated to other populations. Although, we reproduced very similar results when testing it in another early arthritis data set (data not shown). The thresholds we describe for the variables included in the index may require fine-tuning once the HUPI is validated in different cohorts, especially in the case of long-term RA. It would also be interesting to evaluate whether sensitivity to change is influenced by genetic or sociocultural backgrounds, or by differences in measurement of ESR and CRP level, which also affect currently used indices. Future objectives will include the development of thresholds for HUPI to distinguish remission based on recently published criteria (39) and thresholds for low, moderate, and high activity.
In summary, we provide evidence that the HUPI is feasible and sensitive to change in disease activity. In addition, it is accurate and makes it possible to avoid bias arising from sex and missing data.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. González-Álvaro had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Castrejón, Carmona,Belmonte, González-Álvaro,
Acquisition of data. Castrejón, Ortiz, González-Álvaro,
Analysis and interpretation of data. Castrejón, Carmona, Martínez-López, González-Álvaro.