Development and Validation of a New Disease Activity Score in 28 Joints–Based Treatment Response Criterion for Rheumatoid Arthritis

Authors

  • Frank Behrens,

    1. Johann Wolfgang Goethe University, Frankfurt am Main, Germany
    Search for more papers by this author
    • Dr. Behrens has received consultant fees and speaking fees (less than $10,000) from Abbott/AbbVie.

  • Hans-Peter Tony,

    1. University of Würzburg, Würzburg, Germany
    Search for more papers by this author
    • Dr. Tony has received consultant fees, speaking fees, and/or honoraria (less than $10,000 each) from Abbott and Roche.

  • Rieke Alten,

    1. Schlosspark-Klinik and Teaching Hospital Charite, University Medicine, Berlin, Germany
    Search for more papers by this author
    • Dr. Alten has received consultant fees, speaking fees, and/or honoraria (less than $10,000) from Abbott.

  • Stefan Kleinert,

    1. University of Würzburg, Würzburg, Germany
    Search for more papers by this author
    • Dr. Kleinert has received consultant fees, speaking fees, and/or honoraria (less than $10,000) from Abbott.

  • Eva C. Scharbatke,

    1. University of Würzburg, Würzburg, Germany
    Search for more papers by this author
    • Dr. Scharbatke has received consultant fees, speaking fees, and/or honoraria (less than $10,000) from Abbott.

  • Michaela Köhm,

    1. Johann Wolfgang Goethe University, Frankfurt am Main, Germany
    Search for more papers by this author
  • Holger Gnann,

    1. GKM Gesellschaft für Therapieforschung, Munich, Germany
    Search for more papers by this author
    • Dr. Gnann has received consultant fees (less than $10,000) from AbbVie.

  • Johanna Tams,

    1. ICRC-WEYER, Berlin, Germany
    Search for more papers by this author
    • Dr. Tams has received consultant fees (less than $10,000) from AbbVie.

  • Gerd Greger,

    1. AbbVie Deutschland, Wiesbaden, Germany
    Search for more papers by this author
    • Drs. Behrens and Tony and Drs. Greger and Burkhardt contributed equally to this work.

  • Harald Burkhardt

    Corresponding author
    • Johann Wolfgang Goethe University, Frankfurt am Main, Germany
    Search for more papers by this author
    • Dr. Burkhardt has received consultant fees, speaking fees, and/or honoraria (less than $10,000 each) from Abbott Germany and AbbVie Germany and (more than $10,000) from Abbott Germany.


Division of Rheumatology/CIRI, Johann Wolfgang Goethe University, Theodor-Stern-Kai 7, D 60590 Frankfurt/Main, Germany. E-mail: Harald.Burkhardt@kgu.de

Abstract

Objective

To define a valid criterion for treatment response as assessed by the Disease Activity Score in 28 joints (DAS28) that exceeds random disease activity variations in patients with rheumatoid arthritis (RA).

Methods

We utilized anonymized data sets of RA patients from multiple rheumatology centers in Germany to identify patients with stable responses to conventional or biologic disease-modifying antirheumatic drug (DMARD) therapy (discovery cohort). To evaluate fluctuations in DAS28 scores, we subjected patients' DAS28 scores at months 12, 18, and 24 to an analysis of variance model to establish a 1-sided 95% confidence interval for normal fluctuations; this value was used to define the critical difference (DAS28-dcrit) for individual changes from baseline. The DAS28-dcrit value was then applied to analyses of therapeutic response in an adalimumab noninterventional study cohort.

Results

The discovery cohort included 415 patients receiving stable treatment. Values for DAS28-dcrit were comparable regardless of age, sex, disease activity, and class of therapy (DMARDs or biologic agents) and fell below 1.8 in all subgroups. We therefore conclude that DAS28 improvements of 1.8 or higher are outside the normal variation and represent a therapeutic response. When applied to data from the adalimumab noninterventional study (n = 1,874), a DAS28-dcrit response was more robust over time than a European League Against Rheumatism response and was more closely correlated with improved functional capacity.

Conclusion

Based on our data, a DAS28-dcrit value of 1.8 signifies a positive individual therapeutic response that exceeds the threshold of random fluctuation. The DAS28-dcrit criterion may be useful in steering individual therapy and stratifying clinical trials.

INTRODUCTION

The treatment of rheumatoid arthritis (RA) has changed considerably during the past decade due both to the availability of more efficacious therapies and to improved strategies for steering therapy. Treatment regimens in which medications are adjusted to achieve a specified clinical outcome have demonstrated superiority to more rigid therapeutic protocols in randomized trials ([1]). These findings have had an important impact on new RA treatment recommendations by the European League Against Rheumatism (EULAR) ([2]) and the treat-to-target initiative ([3]), both of which define remission as the ultimate goal of RA treatment. However, these guidelines also acknowledge that low disease activity (LDA) is an acceptable alternative goal for patients with longstanding RA ([2, 3]), many of whom are refractory to treatment or unable to achieve response criteria due to longstanding structural damage. Recently published registry data suggest that despite improvements in therapeutic options, remission remains an unrealistic goal for the majority of current RA patients, and even LDA is infrequently attained ([4, 5]). Guidelines recommend a continuous adjustment in therapy as long as composite disease activity assessments indicate a level above the threshold defining LDA. However, no additional guidance is provided to assist clinicians in evaluating whether the patient has had a significant response to therapy.

Statistically valid response criteria for the chosen composite measure would be instrumental in facilitating decision making to achieve treatment goals. In this respect, the Disease Activity Score in 28 joints (DAS28)–based EULAR response is a potentially useful measure that has shown utility in assessing treatment responses in clinical trials ([6, 7]). However, the suitability of these criteria for individual treatment decisions is called into question by the high EULAR responder rates (up to 40% for good plus moderate responses) reported in placebo cohorts of randomized clinical trials ([8-10]).

In order to fill the void for reliable treatment response criteria applicable to individual patients, we have developed a new DAS28-based response measure, the DAS28 critical difference (DAS28-dcrit), by statistical analysis of a large multicenter sample of RA patients with longstanding disease. The development of this measure is based on the simple assumption that a therapeutic effect should, at minimum, result in a DAS28 reduction that significantly exceeds the measurement-associated variability. A comparable approach has been previously used to assess statistically significant changes in the functional capacity of RA patients ([11]).

The DAS28 is a frequently used measure of disease activity in patients with RA. Our goal was to evaluate the long-term reliability of the DAS28 rather than its short-term measurement error. Short-term reliability is usually assessed to estimate the pure instrument (method) error and situational effects of the DAS28 determination, while long-term reliability includes all effects of short-term repetition plus effects of nonsystematic changes in disease activity under stable therapeutic conditions. In a study by Uhlig and colleagues, the short-term variability of DAS28 scores was assessed in 28 patients who were evaluated by the same trained study nurse at 2 time points 5–7 days apart ([12]). The smallest detectable difference for the DAS28 in this study (i.e., “the cut-off values that must be exceeded for a clinician to be 95% confident that a change reflects a true improvement or deterioration”) was 1.32, a rather high value when considered in light of EULAR response criteria, which require changes of 1.2 or 0.6 DAS28 points depending on the patient's baseline disease activity.

In the current study, we focused on long-term reliability because statistical evaluations of an instrument such as the DAS28 should reflect the situation of its proposed practical application. In daily patient care, the decision to change therapy is not made on the basis of repetitive short-term DAS28 assessments, but on the basis of data obtained at 3- to 6-month intervals, usually a single baseline measurement and a single end point measurement. Accordingly, our statistical analyses were planned so that the resulting DAS28-dcrit would discriminate a true response to treatment from random variations in DAS28 values over intervals of several months. This approach has allowed us to establish a statistically valid criterion for assessing therapeutic response in an individual patient based on single-case statistics.

Box 1. Significance & Innovations

  • Statistical analyses were used to establish a new criterion for therapeutic response in rheumatoid arthritis: the Disease Activity Score in 28 joints critical difference (DAS28-dcrit).
  • A DAS28-dcrit of 1.8 signifies a positive individual response exceeding the threshold of random fluctuation.
  • The DAS28-dcrit criterion may be useful in guiding individual therapy and stratifying clinical trials.

MATERIALS AND METHODS

Evaluations of disease activity and functional capacity

Disease activity was assessed by the DAS28, a validated instrument in which higher scores indicate greater disease activity ([13, 14]). Patient response comparisons utilized the EULAR response criteria ([6]). The self-administered Funktionsfragebogen Hannover (FFbH) patient questionnaire was used to assess patient function on a scale of 0–100 units, where 0 = total loss of functional capacity and 100 = maximal functional capacity ([11]); the given FFbH score indicates the remaining percentage of function. The FFbH is validated in RA and is comparable to the Health Assessment Questionnaire disability index ([15]).

Patients

This study utilized 2 patient cohorts: a discovery cohort including only patients receiving stable therapy, and the full patient cohort from a multicenter, noninterventional study of patients with active RA initiating treatment with adalimumab, a tumor necrosis factor α (TNFα) inhibitor.

Patients in the discovery cohort were contributed by 2 academic rheumatology departments and 105 centers throughout Germany, including hospital rheumatology units and private rheumatology practices. Data sets were derived from retrospective chart surveys of patients seen between May 2003 and August 2010. Patient consent for the use of anonymous patient data was not required by German law. All patients in the discovery cohort had received a minimum of 24 months of therapy. Because the goal of this study was to determine a critical difference in disease response, only patients with stable therapy (no changes in therapeutic agents or corticosteroid dose) between 12 and 24 months were included. Patients were also required to have documented DAS28 scores at the beginning and end of 2 consecutive 6-month intervals during the period of stable RA therapy (month 12, month 18, and month 24).

Design of the adalimumab noninterventional study has been previously described ([16]). Briefly, patients were required to have a diagnosis of RA, active disease, a clinical indication for treatment with a TNFα inhibitor and no contraindications, and no prior adalimumab therapy (prior therapy with other biologic agents was allowed). All patients were informed about the objectives of the study and gave written consent for the anonymous use of their personal data in statistical analyses. Because of the noninterventional nature of this study, ethics approval was not required by German law. Patients included in this report enrolled in the study between May 2003 and June 2007. Patients included in the analyses reported here were required to have a baseline DAS28 ≥3.2 and documented DAS28 scores at months 0, 6, and 24.

Calculation of the DAS28-dcrit.

RA patients receiving stable therapy commonly show intraindividual variations in repeated assessments of disease activity conducted at different visits ([12, 17]). There are multiple potential sources of variation for the DAS28, including inaccurate joint counts and imprecise measurement of C-reactive protein level, situational effects on the patient global assessment, and nonsystematic fluctuations in disease activity. To determine the long-term reliability of the DAS28, we evaluated intraindividual variation in DAS28 scores of patients undergoing stable therapy using DAS28 scores at month 12, month 18, and month 24. We adapted the method of Lienert and Raatz ([18]) to determine a critical difference based on the 1-sided 5% Z value of the normal distribution.

Each assessment of disease activity (x) can be conceptualized as being composed of 2 components, the true magnitude (t) and an error component (e). The error is independent of range and there is no correlation between the true magnitude and the error term. Under these conditions, the observed total variance (inline image) is the sum of the true variance (inline image) and the error variance (inline image), yielding the equation inline image. Values for the true variance and the error variance cannot be directly measured, but se (the standard error of the single measurement [SEM]) can be estimated by determining the retest reliability of the measurement instrument.

To determine the retest reliability of the DAS28, we evaluated intraindividual variation in DAS28 scores of patients undergoing stable therapy for 2 years using DAS28 scores at month 12, month 18, and month 24. The reliability coefficient (rtt) and the standard deviation (sx) were calculated by 1-factor analysis of variance for repeated measures and used to determine the SEM (se) by the equation inline image. This method is originally described by Lienert and Raatz ([18]) and is similar to the method proposed by Winer et al ([19]).

We determined a critical difference based on the 1-sided 5% Z value of the normal distribution by means of the following formula ([18]):

display math

where z(α) = the upper 5% Z value of the normal distribution.

The 1-sided critical difference (i.e., DAS28-dcrit value) was used to define therapeutic response ([20]), since only reductions in disease activity were relevant to defining a response; increased disease activity, by definition, does not constitute a therapeutic response. Patients who showed an improvement that equaled or exceeded the 1-sided DAS28-dcrit were considered to have experienced a statistically significant reduction in disease activity that could not be explained by intraindividual variation and were classified as responders. Conversely, changes less than the DAS28-dcrit can be explained by chance alone, so patients with DAS28 reductions below the DAS28-dcrit were classified as nonresponders. A critical difference value for FFbH was determined by applying the same statistical methodology.

Validity analyses

Consistency of response over time was evaluated by examining the percentage of patients with at least one response who experienced additional responses during the 24-month observation period (4 visits at 6-month intervals). Tetrachoric correlations were performed to determine the association between parameters of response and significant improvement in functional capacity; significance was expressed as P values by chi-square test.

Other statistical analyses

The impact of baseline DAS28 on EULAR response was examined by multiple logistic regression analysis using SAS statistical software, version 9.2.

RESULTS

Determination of the critical difference in disease activity

To establish a response criterion that allows treatment-related DAS28 disease activity changes to be reliably discriminated from random variations, we determined the retest reliability of DAS28 in a large sample of RA patients who had received stable RA therapy (no changes in therapeutic agents or corticosteroid dose) between months 12 and 24 of treatment. The requirement for stable therapy allowed intraindividual fluctuations in disease activity to be distinguished from responses to alterations in therapy.

Of the 415 patients in the discovery cohort (Table 1), 97 (23.4%) were contributed by 2 academic rheumatology departments, while the majority (n = 318 [76.6%]) was derived from multiple centers in Germany, thereby assuring that the cohort was representative of patients and patient care throughout Germany.

Table 1. Demographic data and disease characteristics of the discovery cohort of rheumatoid arthritis patients and the noninterventional study cohort at study entry*
 Discovery cohort (n = 415)Noninterventional study cohort (n = 1,874)
  1. Values are the mean ± SD unless otherwise indicated. BMI = body mass index; CRP = C-reactive protein; ESR = erythrocyte sedimentation rate; VAS = visual analog scale; DAS28 = Disease Activity Score in 28 joints; FFbH = Funktionsfragebogen Hannover.
  2. aReduced sample (n = 261).
  3. bReduced sample (n = 308).
Age, years56.1 ± 11.754.8 ± 12.4
Women, %78.177.0
BMI, kg/m226.4 ± 5.025.9 ± 4.6
Disease duration, years11.5 ± 9.711.5 ± 8.9
Tender joint count3.2 ± 4.612.6 ± 7.2
Swollen joint count2.6 ± 3.79.8 ± 6.2
CRP level, mg/liter4.9 ± 11.1a32.8 ± 63.2
ESR, mm/hour16.7 ± 15.133.0 ± 21.8
Patient global assessment (VAS)2.9 ± 1.86.5 ± 1.9
DAS283.2 ± 1.35.9 ± 1.1
FFbH score74.5 ± 20.4b60.2 ± 22.8

Mean ± SD DAS28 values for the full discovery cohort remained constant throughout the observation period (3.22 ± 1.28, 3.20 ± 1.26, and 3.13 ± 1.26 for months 12, 18, and 24, respectively), despite considerable fluctuation in individual DAS28 scores over time. The cohort therefore achieved the most crucial methodologic requirement for calculating a therapy-related critical difference in DAS28 scores: constant disease activity under fixed therapeutic regimens.

The retest reliability, SEM, and 1-sided DAS28-dcrit were calculated from discovery cohort data (Table 2). The 1-sided DAS28-dcrit represents the critical threshold that must be exceeded to fulfill the criterion of nonrandom reduction in disease activity. For the entire discovery cohort, the DAS28-dcrit value was 1.68, indicating that a reduction in DAS28 of at least 1.68 points from baseline may qualify for a response to therapy. Subgroup analyses in which patients were stratified according to clinically relevant characteristics were performed to uncover any potential subgroup bias (Table 2). Only minor variations were noted between different subgroups and the entire cohort, suggesting that retest reliability and the DAS28-dcrit are robust to influences such as age, sex, body mass index, disease duration, or prior treatment modalities. However, a somewhat lower value of 1.45 was calculated in the subgroup of patients with disease activity lower than the group mean (DAS28 ≤3.2), due in part to a reduced variance (SEM) at a comparable reliability (Table 2). The highest DAS28-dcrit value determined in any subgroup was 1.75 (observed in age >60 years and DAS28 >3.2 at month 12). Since the primary goal of our investigation was to define a simple and reliable treatment response criterion that could be applied to all patients seen in clinical practice, we chose 1.75 as the cutoff value that would minimize the risk of a Type I error (false-positive result) and rounded up to 1.8 to allow consistency with the convention of one decimal place used by other DAS28-based response criteria. In the patient population evaluated in this study, a change in DAS28 of at least 1.8 points provides a conservative value for a statistically significant therapeutic response in all patients independent of their individual disease characteristics.

Table 2. Analysis of the DAS28 retest reliability in a large discovery cohort of RA patients (n = 415)*
PopulationNSEM1-sided dcritReliability
  1. DAS28 = Disease Activity Score in 28 joints; RA = rheumatoid arthritis; SEM = standard error of measurement; dcrit = critical difference; BMI = body mass index; DMARD = disease-modifying antirheumatic drug.
  2. a3.2 represents the mean DAS28 value at month 12.
Total4150.7201.680.862
Age >60 years1560.7491.750.853
Age ≤60 years2590.7031.640.867
Men910.6701.560.851
Women3240.7331.710.861
BMI ≤25 kg/m22230.7431.730.860
BMI >25 kg/m21920.6931.620.861
Prior DMARD therapy2250.6691.560.890
Prior biologic agent therapy470.7141.670.862
DAS28 >3.2 at month 12a1940.7491.750.775
DAS28 ≤3.2 at month 12a2210.6221.450.773
RA duration <2 years410.7171.670.767
RA duration ≥2 to <10 years1770.6781.580.896
RA duration ≥10 years1810.7501.750.834

Validity results: evaluation of the DAS28-dcrit as a response criterion

The utility of the DAS28-dcrit value of ≥1.8 as a treatment response criterion was evaluated in a retrospective analysis of data from a noninterventional study of RA patients initiating treatment with adalimumab. Unlike the discovery cohort, these patients were not required to have stable therapy, since our goal was to evaluate treatment responses that exceeded normal fluctuation in disease activity during “real-world” treatment. The demographic characteristics of the noninterventional study cohort were similar to those of the discovery cohort (Table 1), but the noninterventional study population had more severe disease.

Most clinicians who treat RA patients are familiar with EULAR response criteria. We therefore compared therapeutic response rates determined by the newly-defined DAS28-dcrit value with EULAR response rates over the observation period of 2 years (Table 3). EULAR response criteria are based on both current DAS28 state and change in DAS28 ([6]). A good EULAR response was defined as a DAS28 reduction >1.2 and a current state ≤3.2, and a moderate EULAR response was defined as a DAS28 reduction >0.6 and a current state ≤5.1.

Table 3. DAS28 response to adalimumab in a noninterventional study*
ParameterMonth 6 (n = 1,874)Month 12 (n = 1,738)Month 18 (n = 1,728)Month 24 (n = 1,874)
  1. Values are the number (percentage). Complete documentation was not available for all patients at each interim visit. DAS28 = Disease Activity Score in 28 joints; dcrit = critical difference; EULAR = European League Against Rheumatism.
DAS28-dcrit response rates (ΔDAS28 ≥1.8)1,052 (56.1)1,018 (58.6)1,056 (61.1)1,127 (60.1)
Good EULAR response rates (ΔDAS28 >1.2 and current state ≤3.2)611 (32.6)583 (33.5)652 (37.7)687 (36.7)
EULAR response rates (good plus moderate; ΔDAS28 >1.2 and current state ≤3.2 or ΔDAS28 >0.6 and current state ≤5.1)1,545 (82.4)1,455 (83.7)1,455 (84.2)1,562 (83.4)

The application of the EULAR response criteria resulted in overall (good plus moderate) responder rates of >80% during the observation period, which was much higher than responder rates calculated on the basis of the DAS28-dcrit criterion (range 56.1–61.1%). In contrast, a EULAR good response was achieved by a modest proportion (range 32.6–37.7%) of the cohort. The high EULAR responder rates were therefore mainly due to the number of patients achieving a moderate response, which requires a level of change (0.6 points) that, according to our investigations, is indistinguishable from random DAS28 variations.

Pattern of therapeutic response in DAS28-dcrit and EULAR responders

To further test the utility of the DAS28-dcrit criterion, we investigated the subsequent DAS28 response of patients who met the DAS28-dcrit or EULAR good response criteria at 6 months. DAS28-dcritand EULAR good responders showed parallel changes in DAS28 reductions (Figure 1A) and functional improvements (Figure 1B) during 24 months of adalimumab therapy, indicating that treatment-related improvements are comparable in both responder groups.

Figure 1.

Development of A, mean Disease Activity Score in 28 joints (DAS28) and B, mean Funktionsfragebogen Hannover (FFbH) score over time in cohorts of patients fulfilling the response criteria for either European League Against Rheumatism (EULAR) good response (n = 611) or DAS28 critical difference (decrease ≥1.8; n = 1,052) at 6 months. Higher FFbH scores indicate greater functional capacity.

The most noticeable difference in treatment response between the 2 groups was consistently lower mean DAS28 scores in the EULAR good response group. This finding, which reflects the requirement for a DAS28 <3.2 for a EULAR good response, indicates that attainment of a EULAR good response is influenced by a low baseline DAS28. Patients who achieved a EULAR good response at 6 months had lower mean DAS28 scores at baseline than those attaining a DAS28-dcrit response (5.44 versus 6.15) (Figure 1A). This observation was supported by logistic regression analysis: high baseline DAS28 scores were associated with a significantly reduced likelihood of achieving a EULAR good response at 24 months (odds ratio 0.778, 95% confidence interval 0.706–0.858; P < 0.001).

The consistency of response in an individual patient receiving effective therapy is an indicator of the clinical validity of a response criterion. We hypothesized that EULAR good responses would show greater instability over time due to the requirement for a DAS28 reduction that falls in the range of random variation. To address this point, we evaluated the number of visits out of 4 (conducted at 6-month intervals) at which individual patients achieved a response during the 24-month observation period (Table 4). In the group of patients who achieved a DAS28-dcrit response at 1 or more visits (n = 1,300), 43.7% fulfilled the response criterion for all 4 visits, and 66.8% achieved a response for at least 3 periods. In contrast, only 23.9% of patients with at least a single EULAR good response (n = 959) retained their responder status for the entire 24-month period and approximately one-third (31.5%) fulfilled the response criteria only once (versus 16.5% for DAS28-dcrit responders). These data demonstrate that a treatment response based on the DAS28-dcrit definition is more robust to variation over time than a EULAR good response.

Table 4. Consistency of fulfillment of response criteria by individual patients (n = 1,874) during a 24-month observation period (4 visits)*
No. of visits with responseDAS28 decrease ≥1.8 (n = 1,300)aGood EULAR response (n = 959)a
  1. Values are the number (percentage). Visits were conducted at month 6, month 12, month 18, and month 24. DAS28 = Disease Activity Score in 28 joints; EULAR = European League Against Rheumatism.
  2. aPercentages are calculated on the basis of total responders, as defined by the achievement of ≥1 response according to the definition for the indicated criteria. Responses at consecutive visits were not required for multiple responses.
At least 11,300 (100)959 (100)
At least 21,086 (83.5)657 (68.5)
At least 3869 (66.8)433 (45.1)
All 4568 (43.7)229 (23.9)

Clinical relevance of a DAS28-dcrit response

To evaluate the clinical relevance of a DAS28-dcrit response, we examined the association between a DAS28-dcrit response and patient-reported functional capacity. Applying the same methodology as for the DAS28-dcrit calculation (n = 308; not all patients in the discovery cohort had FFbH data), we found that a nonrandom FFbH response required an improvement of ≥18 units in FFbH scores. This FFbH-dcrit was used in tetrachoric correlation analyses of FFbH improvement in various treatment response groups (Table 5). The DAS28-dcrit criterion exhibited the best correlation with FFbH improvement at both month 6 and month 24 and was the only measure with a moderate predictive value for functional improvement at month 24.

Table 5. Tetrachoric correlation between DAS28 response criteria and patient-reported improvement in functional capacity*
Description of 2 × 2 tablesTetrachoric correlationχ2 PN
  1. Disease Activity Score in 28 joints critical difference (DAS28-dcrit) defined as ≥1.8; Funktionsfragebogen Hannover critical difference (FFbH-dcrit) defined as ≥18 FFbH units. EULAR = European League Against Rheumatism.
Correlation between therapeutic response at month 6 and functional improvement at month 6   
DAS28-dcrit → FFbH-dcrit0.4201< 0.00011,870
Good EULAR response → FFbH-dcrit0.1743< 0.00011,870
Good or moderate EULAR response → FFbH-dcrit0.3056< 0.00011,870
Correlation between therapeutic response at month 6 and functional improvement at month 24   
DAS28-dcrit → FFbH-dcrit0.3062< 0.00011,861
Good EULAR response → FFbH-dcrit0.08020.04261,861
Good or moderate EULAR response → FFbH-dcrit0.12860.00601,861

DISCUSSION

In this study, we report the development of a response criterion that allows clinicians to conduct reliable assessments of significant therapeutic responses in individual patients. Current RA guidelines recommend the continuous adjustment of medications to achieve LDA or remission and improve outcomes in patients ([2, 3]). However, these promising changes in RA treatment paradigms also imply continuous decisions on how and when therapy should be modified in the individual patient. To make these decisions, clinicians must be able to accurately judge response to current therapy, an especially difficult task in a patient with longstanding or refractory disease. A reliable response criterion based on a validated composite disease activity measure could be instrumental in facilitating adherence to treatment guidelines.

We utilized the DAS28, which has Gaussian distribution over a continuous scale and is sensitive to change in disease activity ([21]), to establish such a criterion. Our study evaluated the long-term reliability of the DAS28 rather than its short-term measurement error. Patients in clinical care are usually seen at 3- to 6-month intervals, so long-term variability is more relevant to the measured outcome than to short-term variability. Systematic changes in disease activity in our discovery cohort were excluded by the study design, which only included data from patients receiving stable therapy (from month 12 to 24). Statistical analyses were used to determine a DAS28-dcrit value that discriminates a true response to treatment from random variations in DAS28 values over intervals of several months. Based on the most conservative estimate, a DAS28 reduction of 1.8 was determined as the critical threshold that must be exceeded to achieve an unequivocal treatment response in an individual patient. The results of our statistical analysis further imply that improvements below this threshold cannot be differentiated from random variations in DAS28 measurements.

Although our definition of treatment response might appear rather minimalistic, the threshold of arthritis improvement required by the DAS28-dcrit is clearly higher than the disease activity change required for EULAR responses. In the original study leading to the establishment of EULAR response criteria ([7]), the relevant change in disease activity (DAS 0.6) was also based on long-term reliability assessments, similar to the methods used in our investigation. There are, however, important methodologic differences between our studies. The EULAR response criteria were developed in a single-center cohort of 78 early RA patients with at least 3 years of followup. This cohort was studied in the pre–biologic agent era (1985–1994) and the stability of therapy was not reported. In contrast, our calculations are derived from a large multicenter investigation of RA patients (n = 415), most of whom had longstanding, significantly pretreated disease, who were under documented constant therapy. Our data are therefore more likely to reflect the variability of the DAS28 in current daily practice.

Application of the newly defined DAS28-dcrit criterion to data from a 2-year noninterventional RA study proved its usefulness in evaluating treatment response. Patients who met the DAS28-dcrit criterion were more likely to show a consistent response over time than patients who achieved a EULAR good response. The DAS28-dcrit criterion was also more closely associated with improvements in functional capacity and was superior to the EULAR criteria in predicting a significant improvement in functional capacity at 24 months.

Our study does, however, have limitations. Although the DAS28-dcrit criterion was developed in patients receiving a wide array of treatments, including nonbiologic and biologic agents, the validity analysis only included patients who had initiated adalimumab therapy in Germany. It is therefore important that future studies using the DAS28-dcrit criterion include patients treated with other therapeutic modalities and in other countries in order to confirm the utility of this tool in additional patient populations. In addition, although our threshold definition of 1.8 minimizes false-positive results, it inevitably implies a certain risk for Type II error (false-negatives). The probability of incorrectly judging that no treatment response has occurred based on a DAS28 change of less than 1.8 seems to be slightly increased in the subgroup of patients with low baseline disease activity (DAS28 ≤3.2), for which a lower critical difference value of 1.45 was determined in the discovery cohort. However, a retrospective analysis of patients in the confirmation cohort with a baseline DAS28 ≤3.2 revealed that more than 90% of patients who achieved a reduction of 1.45 DAS28 units at month 12 (n = 563) also fulfilled the more stringent DAS28-dcrit criterion of 1.8 (n = 522 [92.7%]). Accordingly, a maximum of 7.3% of patients with low baseline disease activity would potentially be misclassified as nonresponders by the use of the more stringent DAS28-dcrit criterion of 1.8, a percentage that may be acceptable in routine care. However, tolerance for Type II error is clearly reduced in clinical trials. Therefore, the use of an adjusted DAS28-dcrit criterion according to the values in Table 2 might be advisable in randomized clinical trials of RA populations with LDA.

In contrast to the DAS28-dcrit criterion, a EULAR response is based on 2 requirements: improvement in disease activity (≥1.2 for good response or between 0.6 and 1.2 for moderate response) and disease activity below a certain threshold (≤3.2 for good response or between 3.2 and 5.1 for moderate response) ([6]). This combined measure of DAS28 change and current score has been regarded as a particular strength of the EULAR criteria, but it also constitutes a weakness, since a flaw in either component can lead to an inaccurate determination of response. The EULAR requirement for disease activity reduction is considerably below the statistically determined critical difference of 1.8, which represents the lowest level of nonrandom disease activity changes during long-term evaluation, and even falls below the level of short-term variability of 1.3 observed by Uhlig et al ([12]). The contribution of random variation to response is a likely explanation for the observed instability in EULAR good responses in individual patients over 24 months and for the high responder rates observed in the placebo arms of some clinical trials ([8-10]).

It should be noted that achievement of a DAS28-dcrit response does not imply that the extent of improvement in disease activity is sufficient. Therefore, a given patient might require treatment escalation despite a DAS28-dcrit response due to continuing disease activity. Even in these circumstances, however, the DAS28-dcrit allows clinicians to assess the ability of a treatment or drug class to ameliorate disease activity, which may help guide therapeutic choices.

We envision several possible uses for the DAS28-dcrit criterion. A major goal of our investigation was to establish a DAS28-based criterion to guide decisions for therapeutic steering in daily clinical practice. In patients who do not achieve remission, guidelines recommend a continuous adjustment in therapy until LDA is achieved. However, there are no criteria-based definitions provided in the recommendations for deciding whether an individual patient has had a significant response to therapy. The clinician is then left with the difficult decision of whether to continue with a therapy that might not be working or perhaps to switch unnecessarily to an alternative therapy that may introduce new tolerability or safety issues. The DAS28-dcrit criterion provides a statistically valid approach to assessing treatment responses and guiding therapeutic changes in individual patients. The DAS28-dcrit might also be useful as a measure of response in clinical trials, particularly in strategic step-up or step-down trials in which evaluation of an initial treatment response is the basis for determining further patient stratification to additional or alternate therapies. In this context, the DAS28-dcrit might provide some advantages over other response criteria as a robust discriminator of responders from nonresponders. Future studies are needed to investigate the suitability of the DAS28-dcrit criterion in clinical trials, including its validation with respect to radiographic changes.

In summary, our results demonstrate that the DAS28-dcrit response criterion represents a significant change in disease activity that is clinically relevant and generally sustained. The DAS28-dcrit criterion exhibits several advantages compared to the established EULAR response criteria: 1) it is clearly distinguishable from random variations in disease activity, 2) it is robust over time, and 3) its clinical relevance is validated by patient-reported improved functional capacity. Our data suggest that the DAS28-dcrit may provide a convenient tool for guiding therapeutic decisions during treatment of individual patients.

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Burkhardt had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Behrens, Tony, Greger, Burkhardt.

Acquisition of data. Behrens, Tony, Alten, Kleinert, Scharbatke, Köhm, Burkhardt.

Analysis and interpretation of data. Behrens, Tony, Alten, Gnann, Tams, Greger, Burkhardt.

ROLE OF THE STUDY SPONSOR

AbbVie Deutschland GmbH & Co. KG was involved in the study design, data collection, data analysis, and writing of the manuscript, as well as approval of the content of the submitted manuscript. Publication of this article was not contingent on the approval of AbbVie. AbbVie Deutschland GmbH & Co. KG provided medical writing services (performed by Sharon L. Cross, PhD).

ADDITIONAL DISCLOSURES

Author Gnann is an employee of GKM Gesellschaft für Therapieforschung mbH. Author Tams is an employee of ICRC-WEYER GmbH.

Ancillary