SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Objective

To develop a provisional definition for the evaluation of response to therapy in juvenile dermatomyositis (DM) based on the Paediatric Rheumatology International Trials Organisation juvenile DM core set of variables.

Methods

Thirty-seven experienced pediatric rheumatologists from 27 countries achieved consensus on 128 difficult patient profiles as clinically improved or not improved using a stepwise approach (patient's rating, statistical analysis, definition selection). Using the physicians' consensus ratings as the “gold standard measure,” chi-square, sensitivity, specificity, false-positive and-negative rates, area under the receiver operating characteristic curve, and kappa agreement for candidate definitions of improvement were calculated. Definitions with kappa values >0.8 were multiplied by the face validity score to select the top definitions.

Results

The top definition of improvement was at least 20% improvement from baseline in 3 of 6 core set variables with no more than 1 of the remaining worsening by more than 30%, which cannot be muscle strength. The second-highest scoring definition was at least 20% improvement from baseline in 3 of 6 core set variables with no more than 2 of the remaining worsening by more than 25%, which cannot be muscle strength (definition P1 selected by the International Myositis Assessment and Clinical Studies group). The third is similar to the second with the maximum amount of worsening set to 30%. This indicates convergent validity of the process.

Conclusion

We propose a provisional data-driven definition of improvement that reflects well the consensus rating of experienced clinicians, which incorporates clinically meaningful change in core set variables in a composite end point for the evaluation of global response to therapy in juvenile DM.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

The standardization of the criteria to evaluate improvement in rheumatic diseases has been a goal of numerous research groups. This work led to the establishment of a definition of response in rheumatoid arthritis (1), juvenile arthritis (2–4), and systemic lupus erythematosus (SLE) both in adults (5–7) and children (8–10).

The International Myositis Assessment and Clinical Studies (IMACS) group proposed a core set of outcome variables for inclusion in clinical trials in adult and juvenile inflammatory myopathies and defined the degree of change in each core set variable that is clinically meaningful, as well as guidelines for performing clinical trials (11–14). However, until now, these proposals have not yet been formally validated in the context of external prospective pediatric studies or clinical trials. Although children/adolescents and adults with dermatomyositis (DM) share many signs and symptoms of disease, they differ in the clinical features and outcome (15–17), and treatment approaches should consider the peculiarities of juvenile patients as well as their longer life expectancy. Therefore, all of the outcome measures developed for adults need to be subjected to a critical evidence-based evaluation of their measurement properties in children and adolescents.

To help standardize the conduct and reporting of juvenile DM clinical trials and enhance identification of new therapeutic agents, the Paediatric Rheumatology International Trials Organisation (PRINTO) (18), in collaboration with the Pediatric Rheumatology Collaborative Study Group and with the support of the European Union and the US National Institutes of Health, undertook in the year 2000 a multinational effort to develop and promulgate a core set of outcome variables and a definition of clinical improvement to evaluate response to therapy in patients with juvenile DM and juvenile SLE. The first 2 previously published phases of the project (8, 19) led to the development of an evidence-based prospectively validated core set of 6 variables for the evaluation of response to therapy that is now known as the provisional PRINTO/American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) disease activity core set for the evaluation of response to therapy in juvenile DM (Table 1).

Table 1. Final domains and suggested variables included in the final PRINTO/ACR/EULAR core set for the evaluation of response to therapy in juvenile DM (adapted, with permission, from ref.19)*
DomainsSuggested variable(s)
  • *

    PRINTO = Paediatric Rheumatology International Trials Organisation; ACR = American College of Rheumatology; EULAR = European League Against Rheumatism; DM = dermatomyositis; VAS = visual analog scale; CMAS = Childhood Myositis Assessment Scale; MMT = manual muscle strength testing; DAS = Disease Activity Score; MYOACT = Myositis Disease Activity Assessment Visual Analog Scale; MITAX = Myositis Intention-to-Treat Activity Index, A–E version; C-HAQ = Childhood Health Assessment Questionnaire; CHQ PhS = Child Health Questionnaire physical summary score.

Physician's global assessment of the patient's overall disease activity10-cm VAS
Muscle strengthCMAS (or MMT)
Global juvenile DM disease activity toolDAS (or MYOACT or MITAX)
Parent's global assessment of the child's overall well-being10-cm VAS
Functional ability assessmentC-HAQ
Health-related quality of life assessmentCHQ PhS

In this study, we report the results of the third phase of the project, which was aimed at developing a provisional validated definition of improvement to aid in the classification of individual patients in future therapeutic trials and in current clinical practice as either improved or not improved.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

The overall methodology of this phase of the project was based on a methodologic framework used successfully in previous work in rheumatoid arthritis (1), juvenile arthritis (2–4), juvenile SLE (8–10), and inflammatory myopathies (13).

Table 1 shows the 6 core variables validated previously and the respective tools for their assessment. The PRINTO juvenile DM core set includes the following 6 variables: 1) physician's global assessment of the patient's overall disease activity, measured with a 10-cm visual analog scale (VAS; where 0 = no activity and 10 = maximum activity) (20); 2) muscle strength, as assessed by the Childhood Myositis Assessment Scale (CMAS; where 0 = worst and 52 = best) (21–23); 3) global disease activity assessment through the Disease Activity Score (DAS) (24) or, alternatively, the Myositis Disease Activity Assessment (this instrument [25] combines 2 partially overlapping tools: the Myositis Disease Activity Assessment Visual Analog Scale and the Myositis Intention-to-Treat Activity Index, A–E version [25]); 4) parent's global assessment of the child's overall well-being on a 10-cm VAS (where 0 = very well and 10 = very poor) (20, 26, 27); 5) functional ability, as measured by the Childhood Health Assessment Questionnaire (C-HAQ; where 0 = best and 3 = worst) (26, 27); and 6) the health-related quality of life assessment using the physical summary score (PhS) of the Child Health Questionnaire (CHQ) parent version (27, 28). The methods for calculating the scores of the PRINTO juvenile DM core set variables are reported by Ruperto et al (19).

The variables underwent extensive evidence-based evaluation, the process of which has been described previously (19). In particular, all of the variables were found to be feasible and have good construct validity, discriminant ability, and internal consistency. Furthermore, they were not redundant, proved responsive to clinically important change in disease activity, and were strongly associated with treatment outcome and therefore were included in the final core set.

Following this selection of variables for the evaluation of response to therapy, a second consensus conference was held that was attended by 37 experienced pediatric rheumatologists from 27 different countries to ensure wide international acceptance of the results, and was facilitated by 4 of the authors (NR, EHG, BAG, AP) with expertise in nominal group process (29, 30). The overall goal of the meeting was to reach consensus on a provisional validated definition of improvement, incorporating the PRINTO core set of variables using a combination of statistical criteria and consensus formation techniques. In order to achieve this objective, 4 steps (process and analysis) were pursued as briefly described in order below and whose full details can be found elsewhere (2, 19).

Step 1.

We rated each of 128 paper patient profiles as “clinically importantly improved” or “not improved” using nominal group technique. Data from the 294 juvenile DM patients analyzed for the PRINTO/ACR/EULAR juvenile DM core set (19) were used to select a subgroup of 128 difficult/atypical patient profiles presented to conference attendees for evaluation of therapeutic response. The profiles selected (see examples in Table 2) were those that were judged by the conference organizers to be near a putative threshold level of improvement. For example, patients who showed 100% improvement in all of the outcome variables were not good candidates for inclusion because all would agree that the patient had improved, and all of the definitions of improvement would categorize the patient as improved. Each profile contained only information related to the 6 validated juvenile DM core set variables with absolute values at baseline and at 6 months, as well as absolute and percent changes from baseline (Tables 1 and 2). Participants were randomized into 3 “nominal groups” of equal size, and were asked to rate independently all 128 difficult patient profiles as either clinically importantly improved or not improved. If an 80% consensus was not achieved, the case was discussed in a round-robin fashion at each table and if necessary, also in a plenary session. We expected to reach consensus for at least 80% of the patients discussed.

Table 2. Example of 2 patients evaluated at the consensus*
FormulasMonth 0 (a)Month 6 (b)Absolute difference (c = b − a)Difference, % (d = [c/a] × 100)Outcome
  • *

    Readers, by using the related formulas, can calculate improvement/worsening of each variable and apply the juvenile dermatomyositis definition of improvement: at least 20% improvement from baseline in 3 of any 6 core set variables with no more than 1 of the remaining worsening by more than 30%, which cannot be muscle strength. VAS = visual analog scale; [UPWARDS ARROW] = higher tool score of that variable denotes worse activity (e.g., physician's global assessment of the patient's overall disease activity); CMAS = Childhood Myositis Assessment Scale; [DOWNWARDS ARROW] = lower tool score denotes worse values (e.g., CHQ physical summary score); DAS = Disease Activity Score; C-HAQ = Childhood Health Assessment Questionnaire; CHQ = Child Health Questionnaire.

Patient 1 (example of a patient who improved)     
 Physician's global assessment of the patient's overall disease activity (0–10-cm VAS)[UPWARDS ARROW]6.80.3−6.5−96Improved
 Parent's global assessment of the child's overall well-being (0–10-cm VAS)[UPWARDS ARROW]5.20−5.2−100Improved
 CMAS (0–52 score)[DOWNWARDS ARROW]164226163Improved
 DAS (0–20 score)[UPWARDS ARROW]124−8−67Improved
 C-HAQ (0–3 score)[UPWARDS ARROW]2.30.5−1.8−78Improved
 CHQ physical summary score (40–60 score)[DOWNWARDS ARROW]29.153.424.384Improved
Patient 2 (example of a patient who did not  improve)     
 Physician's global assessment of the patient's overall disease activity (0–10-cm VAS)[UPWARDS ARROW]5.69.84.275Not improved
 Parent's global assessment of the child's overall well-being (0–10-cm VAS)[UPWARDS ARROW]1.55.64.1273Not improved
 CMAS (0–52 score)[DOWNWARDS ARROW]2816−12−43Not improved
 DAS (0–20 score)[UPWARDS ARROW]812450Not improved
 C-HAQ (0–3 score)[UPWARDS ARROW]11.50.550Not improved
 CHQ physical summary score (40–60 score)[DOWNWARDS ARROW]23.918.6−5.3−22Not improved

Step 2 (statistical analysis).

Using the physicians' consensus judgments as the “gold standard,” we performed several statistical evaluations (see below) to identify the definition of improvement with the best performance characteristics. We were unable to find in the literature any definitions of improvement that used combinations of the core set variables. Therefore, we tested 999 different definitions of improvement that were deemed clinically reasonable by the Steering Committee of the project (NR, AP, AR, DJL, EHG, AM). Some of the definitions of improvement tested were provided by the IMACS group (13).

Each definition of improvement was classified as either “generic” or “specific” (9). An example of a “generic definition” is as follows: at least 20% improvement from baseline in any 2 of the 6 core set variables with no more than 1 of the remaining worsening by more than 30%. An example of a “specific definition” is as follows: physician's global assessment of the patient's overall disease activity and muscle strength improved by at least 30%, 2 of any remaining 3 improved by at least 20%, and none worsening by more than 30%.

We evaluated the ability of the 999 candidate definitions of improvement to classify individual patients as improved or not improved, and then assessed the agreement between the definitions and consensus of the physicians. We used only patient profiles for which physician consensus was achieved. For each definition, we calculated the chi-square test (1 df) and the corresponding P value, sensitivity, specificity, percent of false-positives, percent of false-negatives, and area under the receiver operating characteristic (ROC) curve (31). The kappa statistic (32) was used to measure the strength of concordance between the definitions and consensus of the physicians. The kappa statistic was converted to a Likert-like scale using the conversion proposed by Landis and Koch (33), where 0.01–0.2 = slight, 0.21–0.4 = fair, 0.41–0.6 = moderate, 0.61–0.8 = substantial, and 0.81–1 = almost perfect agreement. Although the statistical properties of all 999 definitions were presented to the consensus attendees, only definitions with a kappa value >0.7 (substantial agreement), sensitivity and specificity >80%, and percent false-positive and false-negative <20% were retained in the further analysis. Results of the statistical analyses were then presented to the conference attendees.

Step 3.

We then used nominal group technique to decide which of the definitions of improvement with the highest statistical performance was easiest to use and most credible (highest face validity). The attendees were again randomly split into 3 groups and, using nominal group technique, were asked to decide which definitions of improvement (selected among the 999 definitions tested) performed best (in the analysis described above) and were easiest to use and most credible (content validity), ranking the 5 best from 1 (lowest) to 5 (highest content validity).

Step 4.

We multiplied the content validity score by the kappa values to obtain the “best” definitions. For each definition, the 3 content validity rankings obtained by the 3 nominal groups were summed and the resulting sum was multiplied by the corresponding value of the kappa statistic to obtain the “final score” that incorporated both the statistical evaluations and the experts' judgments.

Association between changes in each of the 5 core variables and the overall outcome.

The association between the change in each of the core set variables and the evaluation of response to therapy was analyzed by multiple logistic regression, which used as explanatory variables the baseline to 6-month change in each core set variable and as the dependent outcome the physician's consensus evaluation of the patient's improvement. Odds ratios (ORs) with 95% confidence intervals were reported. Continuous variables were dichotomized according to the best cutoffs provided by the ROC curve analysis (31). The purposes of this postconsensus analysis were to evaluate which were the core set variables that most influenced the consensus decision and to establish the best cutoffs for absolute change for the variables included in the model. The best cutoffs for each core set variable should help physicians decide if a patient is improved based on the absolute change of that particular measure.

Data were entered into an Access XP (Microsoft) database and analyzed with Excel XP (Microsoft), XLSTAT, version 6.1.9 (Addinsoft), Statistica, version 6.0 (StatSoft), and Stata, version 7.0 (StataCorp).

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Table 3 shows the comparison of demographic features and baseline and 6-month values of the core set variables between the subgroup of 128 difficult patients used to create the patient profiles used in this exercise and the remaining 166-patient cohort; the entire cohort of 294 patients was analyzed for the PRINTO/ACR/EULAR juvenile DM core set (19). In general, the features were comparable between cohorts, although the former had longer disease duration. Similarly, the 2 cohorts were comparable at baseline for 5 of the core set variables; the exception being the parent's global assessment of the child's overall well-being. The differences observed at 6 months between the 128-patient cohort and the remaining sample were expected because this 128-patient subgroup was composed of the difficult/atypical patients selected for the consensus exercise that overall responded less to the 6-month treatment given by the treating physicians (see the Patients and Methods section). The remaining 166-patient cohort consisted of patients who achieved the most pronounced levels of improvement after the sixth month of treatment and who were not useful for the purposes of the consensus exercise.

Table 3. Comparison between the difficult patients evaluated at the consensus conference (n = 128) and the remaining patients of the sample collected (n = 166)*
 Month 0Month 6
Validation patients (n = 166)Consensus patients (n = 128)PValidation patients (n = 166)Consensus patients (n = 128)P
  • *

    Values are the mean ± SD unless otherwise indicated. The total sample of 294 patients was used for the analysis of the final PRINTO/ACR/EULAR juvenile DM core set of variables for the evaluation of response to therapy (19). PRINTO = Paediatric Rheumatology International Trials Organisation; ACR = American College of Rheumatology; EULAR = European League Against Rheumatism; DM = dermatomyositis; VAS = visual analog scale; [UPWARDS ARROW] = higher tool score of that variable denotes worse activity (e.g., physician's global assessment of the patient's overall disease activity); CMAS = Childhood Myositis Assessment Scale; [DOWNWARDS ARROW] = lower tool score denotes worse values (e.g., CHQ physical summary score); DAS = Disease Activity Score; C-HAQ = Childhood Health Assessment Questionnaire; CHQ = Child Health Questionnaire.

  • By Student's t-test for independent samples.

  • By Mann-Whitney U test for independent samples.

  • §

    By Pearson's chi-square test.

Age at onset, years7.6 ± 4.17.5 ± 3.50.80   
Age at first observation at the center, years8.2 ± 4.18.3 ± 3.30.79   
Age at study visit, years8.7 ± 4.19.6 ± 3.80.079.3 ± 4.110.1 ± 3.80.09
Disease duration, years1.1 ± 1.82.1 ± 2.4<0.00011.7 ± 1.92.6 ± 2.4<0.0001
Women, no. (%)96 (57.8)81 (63.3)0.34§   
PRINTO/ACR/EULAR juvenile DM core set      
 Physician's global assessment of the patient's overall disease activity (0–10-cm VAS)[UPWARDS ARROW]5.5 ± 2.55.2 ± 2.30.261.4 ± 1.82.4 ± 2.5<0.0001
 Parent's global assessment of the child's overall well-being (0–10-cm VAS)[UPWARDS ARROW]5.6 ± 2.84.8 ± 2.90.011 ± 1.52.4 ± 2.3<0.0001
 CMAS (0–52 score)[DOWNWARDS ARROW]24.1 ± 14.726.4 ± 14.00.1643.6 ± 10.438.9 ± 11.90.0001
 DAS (0–20 score)[UPWARDS ARROW]12.2 ± 3.711.7 ± 3.70.244.3 ± 3.26.9 ± 4.1<0.0001
 C-HAQ (0–3 score)[UPWARDS ARROW]1.7 ± 1.01.6 ± 0.90.200.4 ± 0.60.8 ± 0.8<0.0001
 CHQ physical summary score (40–60 score)[DOWNWARDS ARROW]32.6 ± 11.933.7 ± 11.70.4748.9 ± 8.844.8 ± 9.90.0005

Results of scoring the patient profiles.

Consensus of ≥80% was achieved for 121 (95%) of the 128 difficult patients, with 98 (81%) of the 121 patients being judged as clinically importantly improved and 23 (19%) of the 121 patients being judged as not improved. All 3 nominal groups reached the same consensus opinion as to patient status on all of the profiles.

Identification of the top definitions of improvement as the best performers.

Thirteen of the 999 definitions of improvement reached a kappa value ≥0.8 (almost perfect agreement); their corresponding chi-square values, P values, sensitivity, specificity, percent false-positive and false-negative rates, area under the ROC curve, and kappa statistics are reported in Table 4.

Table 4. Final results for the best definitions of improvement, all with kappa values >0.8*
DefinitionsChi-squareSensitivity, %Specificity, %False-negative, %False-positive, %AUCKappaRankFinal score
  • *

    Definitions are ordered according to the final score. AUC = area under the receiver operating characteristic curve; IMACS = International Myositis Assessment and Clinical Studies group.

  • All chi-square values correspond to P values less than 0.0001.

  • The ranks were obtained by asking the attendees of the consensus meeting to decide on which of the definitions of improvement that performed best were easiest to use and most credible (content validity). Then for each definition, the content validity rankings obtained were summed and the resulting sum was multiplied by the corresponding value of the kappa statistic to obtain the “final score” that incorporated both statistical criteria and experts' judgments.

3 of any 6 improved by at least 20%, no more than 1 worsened by more than 30%, which cannot be muscle strength90.3988793920.86131113
3 of any 6 improved by at least 20%, no more than 2 worsened by ≥25%, which cannot be muscle strength (IMACS definition P1) (13)90.3988793920.8610489
3 of any 6 improved by at least 20%, no more than 2 worsened by more than 30%, which cannot be muscle strength90.3988793920.868170
2 of any 6 improved by at least 40%, no more than 1 worsened by more than 30%, which cannot be muscle strength85.29787133920.846151
2 of any 6 improved by at least 30%, no more than 1 worsened by more than 30%, which cannot be muscle strength90.11007805890.854639
3 of any 6 improved by at least 20%, no more than 1 worsened by more than 30%84.39883104900.833630
3 of any 6 improved by at least 20%, no more than 2 worsened by more than 30%84.39883104900.831714
3 of any 6 improved by at least 20%, no more than 2 worsened by ≥25% (IMACS definition P2) (13)84.39883104900.831311
2 of any 6 improved by at least 40%, no more than 1 worsened by more than 30%79.29783144900.811311
2 of any 6 improved by at least 40%, no more than 2 worsened by more than 30%, which cannot be muscle strength79.29783144900.811311
2 of any 6 improved by at least 30%, no more than 2 worsened by more than 30%, which cannot be muscle strength84.31007406870.8265
3 of any 6 improved by at least 20% (IMACS definition P3) (13)84.39883104900.8333
2 of any 6 improved by at least 30%, no more than 1 worsened by more than 30%84.31007406870.8211

Face validity of the top definitions of improvement and final resolution.

After presentation of the above data, the attendees used nominal group technique to rate content validity (step 3) using a 1–5 scale, with 5 being the highest. The sums of the combined ranks from the 3 nominal groups are shown in Table 4 (range 1–131). Next, the sum of the ranking was multiplied by its respective kappa statistic to obtain the final score (range 1–113), thereby allowing identification of the definitions of improvement with the highest final score. The definition of improvement that scored highest was the following: at least 20% improvement from baseline in 3 of any 6 variables with no more than 1 of the remaining worsening by more than 30%, which cannot be muscle strength (as measured by the CMAS).

As can be seen in Table 4, the definitions that scored second (IMACS P1) and third highest are similar to the first, all requiring an improvement of ≥20% in at least 3 core set variables, but requiring a different number (2 instead of 1) or a different degree of worsening (25% instead of 30%) in the remaining variables (13). The similarity of the top-ranking definitions indicates convergent validity of the measures. Since the statistical performance of the best definitions all had kappa values >0.8, the selection of the final definition of improvement was driven mainly by the ranking (content validity) of the top 5 definitions.

Association between changes in each of the 6 core variables and the overall outcome.

The association between the change in each core set measure and response to therapy was analyzed in a multivariate analysis, as described in the Patients and Methods section. In the final model (Table 5), the physician's global assessment of the patient's overall disease activity appeared to be the strongest predictor of response to therapy (OR 11), followed by the CMAS (OR 10.2) and the parent's global assessment of the child's overall well-being (OR 5.5). The remaining 3 core set variables, the DAS, the C-HAQ, and the CHQ physical summary score, did not reach statistical significance. The best cutoffs for absolute change for the variables included in the model consist of physician's global assessment of the patient's overall disease activity (absolute change): less than or equal to −1.3 (sensitivity 84.7%, specificity 82.6%); CMAS (absolute change): >4 (sensitivity 85.7%, specificity 87.0%); parent's global assessment of the child's overall well-being (absolute change): less than or equal to −1.4 (sensitivity 72.4%, specificity 78.3%); DAS (absolute change): less than or equal to −4 (sensitivity 78.6%, specificity 73.9%); C-HAQ (absolute change): less than or equal to −0.375 (sensitivity 77.6%, specificity 73.9%); and CHQ PhS (absolute change): >10.75 (sensitivity 49.4%, specificity 73.7%) (Table 5).

Table 5. Logistic regression model to predict improvement according to the evaluation of the participants at the consensus conference (n = 102)*
 OR95% CILikelihood ratio test, P
  • *

    Prediction was based on absolute change of the variables included in the final core set. Variables have been dichotomized according to the best cutoffs obtained from the receiver operating characteristic (ROC) curve analysis. Area under ROC curve of the model = 0.9. OR = odds ratio; 95% CI = 95% confidence interval; VAS = visual analog scale; [UPWARDS ARROW] = higher tool score of that variable denotes worse activity (e.g., physician's global assessment of the patient's overall disease activity); CMAS = Childhood Myositis Assessment Scale; [DOWNWARDS ARROW] = lower tool score denotes worse values (e.g., CHQ physical summary score); DAS = Disease Activity Score; C-HAQ = Childhood Health Assessment Questionnaire; CHQ = Child Health Questionnaire.

Physician's global assessment of the patient's overall disease activity (0–10-cm VAS)[UPWARDS ARROW]11.02.1–56.70.003
CMAS (0–52 score)[DOWNWARDS ARROW]10.21.6–65.40.009
Parent's global assessment of the overall child's well-being (0–10-cm VAS)[UPWARDS ARROW]5.51.1–26.70.029
DAS (0–20 score)[UPWARDS ARROW]1.20.2–6.50.81
C-HAQ (0–3 score)[UPWARDS ARROW]0.90.1–5.50.88
CHQ physical summary score (40–60 score)[DOWNWARDS ARROW]1.20.2–7.30.85

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Using a combination of data-driven and consensus-formation processes, pediatric rheumatologists with specific expertise in the assessment of juvenile DM developed a provisional validated definition of improvement that PRINTO proposes for use in future juvenile DM clinical trials. Based on the best-performing definition, improvement in individual patients with juvenile DM can be defined as follows: any 3 among the 6 core set variables improved by at least 20% versus baseline, with no more than 1 of the remaining variables worsening by more than 30%, which cannot be muscle strength.

The provisional definition selected by the consensus panel performed well in the available data set, with high sensitivity and specificity and low false-positive and false-negative rates. The consensus process indicated that this definition had the best content validity as well. The main strength of the definition lies in the consensus of a large number of experienced pediatric rheumatologists from many countries that provided wide international acceptance of the project and in its strong statistical properties. Furthermore, its core set variables (19) were selected by an evidence-based process and validated through a large-scale data collection in patients who had been assessed in a prospective fashion.

During the discussion phase in the content validity session, the participants made it clear that muscle strength is one of the essential components for the evaluation of response to therapy in juvenile DM. For this reason, all of the definitions that required muscle strength not to worsen were highly ranked.

Of note, the second-highest scoring definition was at least 20% improvement from baseline in 3 of any 6 core set variables with no more than 2 of the remaining worsening by more than 25%, which cannot be muscle strength, which is definition P1 selected by the IMACS group (13). This demonstrates convergent validity of the approaches used by the 2 groups, which confirms the validity of the 2 parallel works and the respective findings but in different cohorts. The main difference between the PRINTO and the IMACS group definitions of improvement is that we focused on response criteria for use only in juvenile DM and not also in adult patients with DM and polymyositis. Other differences fully discussed elsewhere (17, 19) are related to the core set of variables with serum muscle enzymes included in the IMACS core set and excluded in the PRINTO core set for their poor statistical performance, and second, the inclusion of the health-related quality of life assessment as a distinct core set of variables specific for children by the PRINTO group, whereas the IMACS investigators did not incorporate it in the core set, although they recommended to include this measure in therapeutic trials of patients with idiopathic inflammatory myopathies. Future studies in an external cohort will allow the comparison and final validation of the 2 proposed core sets and definitions.

The provisional validated definition of improvement was based on a composite combination of outcome measures that were set up to detect a broad range of clinical change. The PRINTO juvenile DM core set includes both objective and subjective measures from both the physician and the patient/parent perspective. The evaluation of response to therapy from different perspectives has the advantage of covering all of the changes induced by the agent under study and of providing information related to the entire spectrum of disease manifestations and consequences. It is also expected to provide better discriminant validity than previous clinical trials that used only muscle strength as the primary outcome (12).

For the practical application of the provisional PRINTO definition of improvement, we reported in Table 1 the domains and suggested variables included in the final core set for the evaluation of response to therapy in juvenile DM (19). The suggested variables to measure each domain are the ones used for the validation of the core set and of the definition of improvement, but researchers can use other variables that might be more appropriate based on their study design or new validation data that may appear in the literature. In addition, in Table 2, two examples are reported with data from real patients used at the consensus conference that will help readers by using the related formulas to apply the PRINTO definition of improvement for juvenile DM. In Table 5, the best cutoffs for absolute change are also reported for the variables included in the model that might help a physician in daily practice to decide if a variable has improved significantly.

A possible limitation of our study is the lack of analysis in the context of a real clinical trial and the fact that the cohort used for the definition/consensus generation is the same as per the provisional validation. Another potential limitation is the small sample of not improved patients because the prevalence of the outcome could have the false-positive/-negative rate. The main strength resides in the large prospective collected data, which rarely is attempted in rheumatic diseases (1, 2, 13) and that enables a comprehensive evidence-based provisional validation of the juvenile DM core set (19) and related definition of improvement.

In summary, PRINTO developed and validated a data-driven provisional definition of improvement that will help standardize the conduct of juvenile DM clinical trials and assist clinicians in daily practice when attempting to classify patients as either responders or nonresponders. The definition of improvement derived here should undergo final validation in future controlled studies in different external cohorts of patients. This will allow examination of its discriminant validity in detecting a therapeutic response greater than placebo or an active comparator, and to establish whether refinements in currently available instruments are required.

AUTHOR CONTRIBUTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Ruperto had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Ruperto, Pistorio, Ravelli, Wulffraat, Lahdenne, Murray, Pachman, Giannini, Gare, Martini.

Acquisition of data. Ruperto, Ravelli, Rider, Pilkington, Oliveira, Espada, Garay, Cuttica, Hofer, Quartier, Melo-Gomes, Reed, Wierzbowska, Feldman, Harjacek, Huppertz, Nielsen, Flato, Lahdenne, Michels, Murray, Punaro, Rennebohm, Russo, Balogh, Rooney, Pachman, Wallace, Hashkes, Lovell, Gare, Martini.

Analysis and interpretation of data. Ruperto, Pistorio, Ravelli, Melo-Gomes, Nielsen, Murray, Martini.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

We are indebted to Drs. Anna Tortorelli, Monica Tufillo, and Elisabetta Maggi for their help in data handling, organization skills, and overall management of the project. We are also thankful to Dr. Luca Villa and Mr. Michele Pesce for their help in database development. The authors wish to acknowledge the attendees of the Camogli, Italy, “International Consensus Conference on defining improvement in JSLE and JDM” for their work during the meeting. Organizers: Alberto Martini, MD, Nicolino Ruperto, MD, MPH, Angelo Ravelli, MD, Angela Pistorio, MD, PhD (Italy); Edward H. Giannini, MSc, DrPH, Daniel J. Lovell, MD, MPH (US); Boel Andersson-Gäre, MD, PhD (Sweden). Attendees: Carmen De Cunto, MD, Ruben Cuttica, MD (Argentina); Rik Joos, MD (Belgium); Claudia Magalhaes Saad, MD, Sheila Oliveira, MD (Brazil); Dimitrina Mihaylova, MD (Bulgaria); Brian M. Feldman, MD, MSc (Canada); Miroslav Harjacek, MD (Croatia); Pavla Dolezalova, MD (Czech Republic); Susan Nielsen, MD (Denmark); Pekka Lahdenne, MD (Finland); Anne Marie Prieur, MD (France); Hans-Iko Huppertz, MD (Germany); Florence Kanakoudi Tsakalidou, MD (Greece); Philip Hashkes, Yosef Uziel, MD (Israel); Ingrida Rumba, MD (Latvia); Ruben Burgos Vargas, MD (Mexico); Nico Wulffraat, MD (The Netherlands); Berit Flato, MD (Norway); Malgorzata Wierzbowska, MD (Poland); Jose Antonio Melo-Gomes, MD (Portugal); Gordana Susic, MD (Serbia); Richard Vesely, MD (Slovakia); Tadej Avcin, MD (Slovenia); Michael Hofer, MD (Switzerland); Huri Ozdogan, MD (Turkey); Clarissa Pilkington, MD, Madeleine Rooney, MD (UK); Daniel J. Lovell, MD, MPH, Lauren M. Pachman, MD, Lisa G. Rider, MD, Ann M. Reed, MD, Robert Rennebohm, MD, Carol Wallace, MD (US). External observers: Marcia Bandeira, MD (Brazil); Jenny Pratsidou, MD (Greece); Stella Maris Garay, MD (Argentina).

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  • 1
    Felson DT, Anderson JJ, Boers M, Bombardier C, Furst D, Goldsmith C, et al. American College of Rheumatology preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum 1995; 38: 72735.
  • 2
    Giannini EH, Ruperto N, Ravelli A, Lovell DJ, Felson DT, Martini A. Preliminary definition of improvement in juvenile arthritis. Arthritis Rheum 1997; 40: 12029.
  • 3
    Ruperto N, Ravelli A, Falcini F, Lepore L, De Sanctis R, Zulian F, et al. Performance of the preliminary definition of improvement in juvenile chronic arthritis patients treated with methotrexate. Ann Rheum Dis 1998; 57: 3841.
  • 4
    Albornoz MA. ACR formally adopts improvement criteria for juvenile arthritis (ACR pediatric 30). ACR News 2002; 21: 3.
  • 5
    Renal Disease Subcommittee of the American College of Rheumatology Ad Hoc Committee on Systemic Lupus Erythematosus Response Criteria. The American College of Rheumatology response criteria for proliferative and membranous renal disease in systemic lupus erythematosus clinical trials. Arthritis Rheum 2006; 54: 42132.
  • 6
    Strand V, Gladman D, Isenberg D, Petri M, Smolen J, Tugwell P. Outcome measures to be used in clinical trials in systemic lupus erythematosus. J Rheumatol 1999; 26: 4907.
  • 7
    Smolen JS, Strand V, Cardiel M, Edworthy S, Furst D, Gladman D, et al. Randomized clinical trials and longitudinal observational studies in systemic lupus erythematosus: consensus on a preliminary core set of outcome domains. J Rheumatol 1999; 26: 5047.
  • 8
    Ruperto N, Ravelli A, Murray KJ, Lovell DJ, Andersson-Gare B, Feldman BM, et al. Preliminary core sets of measures for disease activity and damage assessment in juvenile systemic lupus erythematosus and juvenile dermatomyositis. Rheumatology (Oxford) 2003; 42: 14529.
  • 9
    Ruperto N, Ravelli A, Cuttica R, Espada G, Ozen S, Porras O, et al, for the Pediatric Rheumatology International Trials Organization (PRINTO) and the Pediatric Rheumatology Collaborative Study Group (PRCSG). The Pediatric Rheumatology International Trials Organization criteria for the evaluation of response to therapy in juvenile systemic lupus erythematosus: prospective validation of the disease activity core set. Arthritis Rheum 2005; 52: 285464.
  • 10
    Ruperto N, Ravelli A, Oliveira S, Alessio M, Mihaylova D, Pasic S, et al, for the Pediatric Rheumatology International Trials Organization (PRINTO) and the Pediatric Rheumatology Collaborative Study Group (PRCSG). The Pediatric Rheumatology International Trials Organization/American College of Rheumatology provisional criteria for the evaluation of response to therapy in juvenile systemic lupus erythematosus: prospective validation of the definition of improvement. Arthritis Rheum 2006; 55: 35563.
  • 11
    Rider LG, Giannini EH, Harris-Love M, Joe G, Isenberg D, Pilkington C, et al. Defining clinical improvement in adult and juvenile myositis. J Rheumatol 2003; 30: 60317.
  • 12
    Miller FW, Rider LG, Chung YL, Cooper R, Danko K, Farewell V, et al. Proposed preliminary core set measures for disease outcome assessment in adult and juvenile idiopathic inflammatory myopathies. Rheumatology (Oxford) 2001; 40: 126273.
  • 13
    Rider LG, Giannini EH, Brunner HI, Ruperto N, James-Newton L, Reed AM, et al, for the International Myositis Assessment and Clinical Studies Group. International consensus on preliminary definitions of improvement in adult and juvenile myositis. Arthritis Rheum 2004; 50: 228190.
  • 14
    Oddis CV, Rider LG, Reed AM, Ruperto N, Brunner HI, Koneru B, et al, for the International Myositis Assessment and Clinical Studies Group. International consensus guidelines for trials of therapies in the idiopathic inflammatory myopathies. Arthritis Rheum 2005; 52: 260715.
  • 15
    Feldman BM, Rider LG, Reed AM, Pachman LM. Juvenile dermatomyositis and other idiopathic inflammatory myopathies of childhood. Lancet 2008; 371: 220112.
  • 16
    Ramanan AV, Feldman BM. Clinical features and outcomes of juvenile dermatomyositis and other childhood onset myositis syndromes. Rheum Dis Clin North Am 2002; 28: 83357.
  • 17
    Rider LG. Outcome assessment in the adult and juvenile idiopathic inflammatory myopathies. Rheum Dis Clin North Am 2002; 28: 93577.
  • 18
    Ruperto N, Martini A. International research networks in pediatric rheumatology: the PRINTO perspective. Curr Opin Rheumatol 2004; 16: 56670.
  • 19
    Ruperto N, Ravelli A, Pistorio A, Ferriani V, Calvo I, Ganser G, et al, for the Paediatric Rheumatology International Trials Organisation (PRINTO) and the Pediatric Rheumatology Collaborative Study Group (PRCSG). The provisional Paediatric Rheumatology International Trial Organisation/American College of Rheumatology/European League Against Rheumatism disease activity core set for the evaluation of response to therapy in juvenile dermatomyositis: a prospective validation study. Arthritis Rheum 2008; 59: 413.
  • 20
    Rider LG, Feldman BM, Perez MD, Rennebohm RM, Lindsley CB, Zemel LS, et al, and the Juvenile Dermatomyositis Disease Activity Collaborative Study Group. Development of validated disease activity and damage indices for the juvenile idiopathic inflammatory myopathies. I. Physician, parent, and patients global assessments. Arthritis Rheum 1997; 40: 197683.
  • 21
    Lovell DJ, Lindsley CB, Rennebohm RM, Ballinger SH, Bowyer SL, Giannini EH, et al, and the Juvenile Dermatomyositis Disease Activity Collaborative Study Group. Development of validated disease activity and damage indices for the juvenile idiopathic inflammatory myopathies. II. The Childhood Myositis Assessment Scale (CMAS): a quantitative tool for the evaluation of muscle function. Arthritis Rheum 1999; 42: 22139.
  • 22
    Rennebohm RM, Jones K, Huber AM, Ballinger SH, Bowyer SL, Feldman BM, et al, for the Juvenile Dermatomyositis Disease Activity Collaborative Study Group. Normal scores for nine maneuvers of the Childhood Myositis Assessment Scale. Arthritis Rheum 2004; 51: 36570.
  • 23
    Huber AM, Feldman BM, Rennebohm RM, Hicks JE, Lindsley CB, Perez MD, et al, and the Juvenile Dermatomyositis Disease Activity Collaborative Study Group. Validation and clinical significance of the Childhood Myositis Assessment Scale for assessment of muscle function in the juvenile idiopathic inflammatory myopathies. Arthritis Rheum 2004; 50: 1595603.
  • 24
    Bode RK, Klein-Gitelman MS, Miller ML, Lechman TS, Pachman LM. Disease Activity Score for children with juvenile dermatomyositis: reliability and validity evidence. Arthritis Rheum 2003; 49: 715.
  • 25
    Isenberg DA, Allen E, Farewell V, Ehrenstein MR, Hanna MG, Lundberg IE, et al. International consensus outcome measures for patients with idiopathic inflammatory myopathies: development and initial validation of myositis activity and damage indices in patients with adult onset disease. Rheumatology (Oxford) 2004; 43: 4954.
  • 26
    Singh G, Athreya BH, Fries JF, Goldsmith DP. Measurement of health status in children with juvenile rheumatoid arthritis. Arthritis Rheum 1994; 37: 17619.
  • 27
    Ruperto N, Ravelli A, Pistorio A, Malattia C, Cavuto S, Gado-West L, et al. Cross-cultural adaptation and psychometric evaluation of the Childhood Health Assessment Questionnaire (CHAQ) and the Child Health Questionnaire (CHQ) in 32 countries: review of the general methodology. Clin Exp Rheumatol 2001; 19: S19.
  • 28
    Landgraf JM, Abetz L, Ware JE. The CHQ user's manual. 1st ed. Boston: Health Institute, New England Medical Center; 1996.
  • 29
    Delbecq AL, van de Ven AH, Gustafson DH. Group techniques for program planning: a guide to nominal group and Delphi processes. 1st ed. Glenview (IL): Scott Foresman and Company; 1975.
  • 30
    Ruperto N, Meiorin S, Iusan SM, Ravelli A, Pistorio A, for the Paediatric Rheumatology International Trials Organisation (PRINTO). Consensus procedures and their role in pediatric rheumatology. Curr Rheumatol Rep 2008; 10: 1426.
  • 31
    Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978; 8: 28398.
  • 32
    Cohen J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 1977.
  • 33
    Landis JR, Koch GC. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 15974.