Prospective validation of the provisional criteria for the evaluation of response to therapy in childhood-onset systemic lupus erythematosus

Authors


Abstract

Objective

To prospectively validate the provisional criteria for the evaluation of response to therapy in children with systemic lupus erythematosus (SLE).

Methods

In this multicenter study, childhood-onset SLE patients (n = 98; 81 girls, 17 boys, 50% white, 88% non-Hispanic) were followed every 3 months for up to 7 visits (total number of visits 623). The 5 childhood-onset SLE core response variables were obtained at the time of each visit: 1) physician assessment of overall disease activity, 2) parent assessment of patient well-being, 3) Child Health Questionnaire, 4) proteinuria, and 5) global disease activity measure score, as measured by the European Consensus Lupus Activity Measure (ECLAM), the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI), or the Systemic Lupus Activity Measure (SLAM). Physician-rated relevant changes in the disease course (clinically relevant improvement, no change in disease, or worsening) between visits served as the criterion standard. Mixed models were used to assess the diagnostic accuracy of the 4 highest-ranked provisional definitions of response to therapy.

Results

There were 89 episodes of clinically relevant improvement between 2 consecutive visits, and 448 episodes without improvement. Irrespective of the choice of the global disease activity measure (ECLAM, SLAM, SLEDAI), sensitivities of all 4 highest-ranked definitions were low (all ≤31%), whereas their specificities were excellent (all >88%). Using logistic models, alternative definitions can be developed with both 80% sensitivity and specificity.

Conclusion

The provisional criteria of response to therapy in childhood-onset SLE may have considerably lower sensitivity than previously reported. Additional validation in clinical trials is necessary to further evaluate the measurement properties of the provisional Paediatric Rheumatology International Trials Organisation/American College of Rheumatology criteria for response to therapy in children with SLE.

INTRODUCTION

Systemic lupus erythematosus (SLE) is a complex, chronic, multisystem autoimmune inflammatory disease, and up to 20% of SLE patients are diagnosed during childhood, i.e., prior to the age of 16 years (1, 2). Compared with adults with SLE, patients with childhood-onset SLE more often have severe disease phenotypes, including a higher prevalence of kidney involvement (3).

Highly sensitive and specific surrogate markers are needed to serve as primary outcome measures of clinical trials of childhood-onset SLE that study the efficacy of novel medications. The lack of validated surrogate markers is considered a major barrier to the testing of safer and more effective therapies for childhood-onset SLE (4).

Using consensus methodology, the Paediatric Rheumatology International Trials Organisation (PRINTO)/American College of Rheumatology (ACR) provisional criteria for the evaluation of response to therapy for children with childhood-onset SLE were developed. Initial studies suggest that these criteria can measure response to therapy (or clinically relevant improvement) of individual patients with high sensitivity and specificity using an algorithm that considers percentage changes of 5 childhood-onset SLE core set parameters (5). Parameters include the score of an index of global disease activity, physician assessment of overall disease activity, parent assessment of patient overall well-being, proteinuria, and patient health-related quality of life (HRQOL) (6).

The Classification and Response Criteria Subcommittee of the ACR Committee on Quality Measures pointed out that validation of any outcome measures or response definitions is a dynamic process. Confirmatory studies are mandated to substantiate the usefulness of response criteria in other patient cohorts and by using different raters than those involved in the criteria development (7).

Therefore, we undertook a prospective cohort study to corroborate the measurement properties of the PRINTO/ACR provisional criteria for the evaluation of response to therapy. We specifically investigated their sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for identifying childhood-onset SLE patients who have experienced clinically relevant improvement.

MATERIALS AND METHODS

SLE patients.

Children (n = 98) fulfilling the ACR classification criteria for SLE (2) prior to the age of 16 years were consecutively recruited during routine clinic visits at 7 academic pediatric rheumatology centers in the US. Study visits occurred every 3 months for up to 18 months; each time, height, weight, findings on physical examination, and medication regimens were recorded, and disease activity and HRQOL were measured.

Provisional criteria of improvement.

Details on the development and initial validation of the childhood-onset SLE core set parameters and the provisional definition of improvement are available elsewhere (8–10).

Briefly, changes in 5 childhood-onset SLE core parameters were used to define improvement with childhood-onset SLE: 1) physician assessment of overall disease activity, as measured on a visual analog scale (VAS) ranging from 0 to 10 (physician assessment VAS; where 0 = inactive disease and 10 = very active disease); 2) parent assessment of patient overall well-being, as measured on a VAS ranging from 0 to 10 (well-being VAS; where 0 = very poor and 10 = very well); 3) global disease activity, as measured by a validated disease activity index; 4) HRQOL, as measured by the Child Health Questionnaire (CHQ) physical summary score (PHS); and 5) renal involvement, as measured by daily proteinuria (5).

Consensus methodology and data-driven validation resulted in several proposed candidate criteria of improvement; the 4 highest-ranked criteria (a–d) were tested for this study in more detail: a = improvement of 2 of any 5 core variables by ≥50% without worsening of more than 1 by ≥30% and without increase in proteinuria; b = improvement of 2 of any 5 core variables by ≥40% without worsening of more than 1 by ≥30% without increase in proteinuria; c = improvement of 3 of any 5 core variables by ≥30% without worsening of more than 1 by ≥30% without increase in proteinuria; and d = improvement of 2 of any 5 core variables by ≥50% without worsening of more than 2 by ≥30% without increase in proteinuria.

Details on the core set variables used to define improvement.

SLE disease indices to measure global disease activity.

It has been suggested that any of 3 indices of global disease activity can be used interchangeably as a core set parameter when measuring improvement: 1) the Systemic Lupus Assessment Measure (SLAM; range 0–81, where 0 = inactive disease) (11), 2) the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI; range 0–105, where 0 = inactive disease) (12), or 3) the European Consensus Lupus Assessment Measure (ECLAM) (13). Different from the SLAM and the SLEDAI, the global disease activity score of the ECLAM does not correspond to the sum of its item scores. Rounding procedures and special scoring rules for patients with single-organ involvement are considered, and this yields integer ECLAM summary scores between 0 and 10 (where 0 = inactive disease).

HRQOL measure.

For the childhood-onset SLE core set, HRQOL was measured by the CHQ, a generic HRQOL inventory whose parent-completed version has been translated into numerous languages and culturally cross-validated for use in childhood-onset SLE (14–16). Two summary scores can be derived to measure psychosocial health and physical health (PHS). The CHQ PHS is the proposed measure of HRQOL to be considered in the childhood-onset SLE core set.

Renal involvement.

A timed urine collection has been suggested for the childhood-onset SLE core set to assess renal involvement (5). For this study, we measured the protein to creatinine ratio in a random urine sample instead. The rationale was that, in recent years, the protein to creatinine ratio has been proven to be an accurate approximation of daily protein excretion (17). The protein to creatinine ratio is now commonly used in clinics and even in clinical trials to measure proteinuria (18).

Values of the protein to creatinine ratio at ≥0.2 were considered abnormal, values of <0.2 were considered normal (19), and all smaller values were rounded up to 0.15. Any changes within the range of normal values were not considered as improved for the purposes of the childhood-onset SLE core set.

Additional parameters measured for potential use in the childhood-onset SLE core set.

Measures of global disease activity.

The British Isles Lupus Activity Group (BILAG) Index is another validated SLE disease index, but it has not been proposed for use in the childhood-onset SLE core set. For this study, the BILAG Index was completed as a potential alternative measure of disease activity (20, 21). For each of the 8 organ systems considered (general, mucocutaneous, neurologic, musculoskeletal, cardiovascular and respiratory, vasculitic, hematologic, and renal), an alphabetical domain score is obtained that can be converted to a numerical value by the use of 1 of 3 conversion schemes, as suggested by Gladman et al (A = 4, B = 3, C = 2, D = 1, E = 0) (22), Liang et al (A = 10, B = 6.7, C = 3.3, D or E = 0) (11), and Stoll et al (A = 9, B = 3, C = 1, D or E = 0) (23), with higher BILAG Index scores signifying higher disease activity. Global disease activity as measured by the BILAG Index is the sum of the numerical domain scores.

Measure of immunologic activity.

The original childhood-onset SLE core set comprises 6 core variables for disease activity (5). In addition to the 5 childhood-onset SLE core variables considered in the childhood-onset SLE criteria for response to therapy, there is a parameter to represent immunologic disease activity. This is achieved by measuring levels of anti–double-stranded DNA (anti-dsDNA) antibodies. For this study, anti-dsDNA antibodies were measured by the investigators as part of the standard of care, using various laboratory assays. To be considered as improved in this study, anti-dsDNA antibodies had to decrease by a certain percentage, plus either be newly within the normal range (previous visit abnormal) or remain above the upper bounds for normal (stay abnormal). Further decreases of values of anti-dsDNA antibodies that were already in the range of normal were not considered as improved.

Criterion standard for determining patient improvement.

In response to the sentence stem, “Compared to the last study visit 3 months ago and the patient's overall disease, the patient experienced a …,” the managing pediatric rheumatologist rated the change in disease course on a 5-point Likert scale as follows: major flare of disease, minor flare of disease, no change in disease, minor improvement of disease, or major improvement of disease.

External standards used in exploratory analyses.

In exploratory analysis, we assessed how the provisional criteria of improvement would reflect the family's perspective. Therefore, the parent rated the change of their child's disease on a 5-point Likert scale (much worse, somewhat worse, unchanged, somewhat improved, or much improved) that was presented with the sentence stem, “Compared to the last study visit 3 months ago, and when considering medications, school, work, life at home, doctor visits, pains, and feelings, the overall well-being is. …”

Statistical analysis.

Numerical variables were summarized by the mean ± SEM or SD; categorical variables were summarized by the frequency (in percentage).

Numerical core set variables were assessed for their associations with the PRINTO/ACR improvement criteria (a dichotomous variable of improved versus unimproved or unchanged) using mixed-effect models that adjusted for patient demographic and baseline clinical characteristics. A random effect was used to account for within-patient correlation caused by repeated measurements. In order to predict the likelihood of improvement as per the PRINTO/ACR criteria (a dichotomized dependent variable) using core set variables, both univariate and multivariate logistic regression models were applied. A generalized estimating equation method was used in the logistic regression models to account for within-patient correlation. In the univariate logistic regression models, each of the core set variables was considered the only predictor at the time, whereas in the multivariate logistic regression models, all of the core set variables were included as the predictors of interest. In order to assess whether the predicted PRINTO/ACR improvement was sensitive to the choice of different global disease activity indices, a series of multivariate models was generated, using one of the global disease activity indices at a time. Contributions of other core set variables were also assessed in the competing multivariate models by deleting one of these core set variables at a time. The predicted log odds (or scores) of improvement from logistic regression models were further used to assess their diagnostic accuracy using the receiver operating characteristic curve and the area under the curve (AUC), sensitivity, and specificity, respectively. Sensitivity, specificity, PPV, and NPV were also used to evaluate the diagnostic accuracy of the 4 highest-ranked PRINTO/ACR definitions of childhood-onset SLE response to therapy. Confidence intervals of the AUC were estimated using a bootstrap method, with a total of 2,500 replicates for each model (24, 25).

Statistical computations were performed using SAS software, version 9.2 (SAS, Cary, NC). P values less than 0.05 were considered statistically significant. Diagnostic accuracy was considered excellent, outstanding, good, fair, or poor if the AUC was in the range of 0.91–1.0, 0.81–0.90, 0.71–0.80, 0.61–0.70, or 0.50–0.60, respectively (26).

Ethics review.

This study was approved by the institutional review boards of the participating pediatric rheumatology centers. Informed consent was obtained from all of the parents and, as appropriate, assent was given by the participants prior to the study procedures.

RESULTS

Characteristics and disease course of patients with SLE.

The demographics and disease features of the patients with childhood-onset SLE are shown in Table 1. A total of 98 children (81 girls, 17 boys) were included in the analysis. The population consisted of 49 white, 32 African American, 3 Asian, and 3 mixed-race patients (87 non-Hispanics, 11 Hispanics). Data from 623 visits (or 526 between-visit intervals) were available for analysis. There were 39 patients with biopsy-proven lupus nephritis. The mean ± SD damage as measured by the Systemic Lupus International Collaborating Clinics/ACR Damage Index (27) was 0.42 ± 0.1. The global disease course with childhood-onset SLE on consecutive visits during the study is depicted in Figure 1. There were 35 renal flares (major or minor) and 42 episodes of renal improvement (major or minor).

Table 1. Demographics and SLE features at baseline*
ParameterNMean ± SD
  • *

    SLE = systemic lupus erythematosus; SDI = Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index; VAS = visual analog scale; SLAM = Systemic Lupus Activity Measure; SLEDAI = Systemic Lupus Erythematosus Disease Activity Index; BILAG = British Isles Lupus Activity Group Index.

  • World Health Organization classification of lupus nephritis: class II (n = 3), class III (n = 8), class IV (n = 20), or class V (n = 8).

  • There were 21 patients without proteinuria, as defined by a protein to creatinine ratio of <0.2. All smaller values were rounded up to 0.15.

Age, years9815.3 ± 2.85
Disease duration, years981.5 ± 2.0
Current medications 15.1 ± 1.8
 Prednisone, mg/day82 
 Azathioprine, mycophenolate  mofetil, methotrexate47 
 Cyclophosphamide6 
 Hydroxychloroquine73 
 Nonsteroidal antiinflammatory drugs24 
 At least 1 antihypertensive  medication38 
Biopsy-proven lupus nephritis39 
Proteinuria980.44 ± 0.96
Disease damage, SDI score (0 = no  damage)980.42 ± 0.1
Physician assessment of overall  disease activity (VAS)982.5 ± 1.95
Child Health Questionnaire physical  function score9842.4 ± 12.14
Disease activity98 
 SLAM 7.64 ± 6.01
 SLEDAI 5.18 ± 4.35
 Stoll et al BILAG 5.31 ± 5.44
 Liang et al BILAG 11.8 ± 8.81
 Gladman et al BILAG 8.7 ± 3.57
Figure 1.

Disease course: external standards. A, The course of childhood-onset systemic lupus erythematosus (SLE) between 3 monthly study visits was rated by the managing physicians. Many patients had a stable disease course at the time of the followup visit, as rated by their physician. The diagnostic accuracy of the Paediatric Rheumatology International Trials Organisation (PRINTO)/American College of Rheumatology (ACR) criteria for response to therapy changes in the core set of patients who showed major or minor clinically relevant improvement was compared with patients whose disease was rated as unchanged or who were considered to have a minor or major flare. A total of 536 between-visit intervals were available. B, The course of childhood-onset SLE between 3 monthly study visits was also rated by the parent of the index patient with childhood-onset SLE. Many patients had a stable disease course between visits, as rated by their parents. The diagnostic accuracy of the PRINTO/ACR criteria for response to therapy changes in the core set of patients who showed much or somewhat improvement was compared with patients whose disease was considered unchanged, or whose disease course suggested somewhat or much worse disease at the time of the followup visit. A total of 514 between-visit intervals were available (12 missing values).

The mean changes of the core set parameters by disease course (unimproved or flare and unchanged combined versus improved) as rated by the managing pediatric rheumatologist are shown in Table 2. Compared with patients who were not improved, the core variables of patients rated as improved significantly changed, with the exception of proteinuria and the CHQ PHS.

Table 2. Change in core set variables with physician-rated childhood-onset SLE course*
 Unimproved (n = 448 episodes)Improved (n = 89 episodes)P
  • *

    Values are the mean ± SEM. SLE = systemic lupus erythematosus; VAS = visual analog scale; NS = not significant; ECLAM = European Consensus Lupus Activity Measure; SLEDAI = Systemic Lupus Erythematosus Disease Activity Index; SLAM = Systemic Lupus Activity Measure; BILAG = British Isles Lupus Activity Group Index; CHQ PHS = Child Health Questionnaire physical summary score.

  • Based on mixed models.

  • Means of change from the previous visit to the current visit are significant at P < 0.001.

  • §

    Means of change from the previous visit to the current visit are significant at P < 0.01.

  • Means of change from the previous visit to the current visit are significant at P < 0.05.

Physician assessment VAS0.11 ± 0.06−1.01 ± 0.14< 0.001
Well-being VAS−0.01 ± 0.090.54 ± 0.20§< 0.05
Proteinuria0.03 ± 0.08−0.08 ± 0.17NS
ECLAM0.22 ± 0.07§−1.12 ± 0.16< 0.001
SLEDAI0.25 ± 0.15−2.27 ± 0.34< 0.001
SLAM0.36 ± 0.34−1.78 ± 0.79< 0.05
Liang et al BILAG0.16 ± 0.32−4.61 ± 0.72< 0.001
Gladman et al BILAG0.18 ± 0.11−1.22 ± 0.24< 0.001
Stoll et al BILAG0.10 ± 0.18−2.79 ± 0.40< 0.001
CHQ PHS−0.09 ± 0.511.94 ± 1.13NS

When only children with biopsy-proven lupus nephritis were considered, then the change in the mean ± SEM of proteinuria was −0.1 ± 0.4 and −0.005 ± 0.17 for improved and unimproved courses, respectively (P = 0.83).

Diagnostic accuracy of the candidate improvement definitions to capture improvement of childhood-onset SLE.

Irrespective of the index used in the core set to measure global disease activity and for all candidate criteria assessed, the sensitivity did not exceed 31%, and the PPV did not exceed 48% (Table 3). However, specificity and the NPV remained high when using this validation data set, at ≥89% and ≥84%, respectively. Of note, some candidate definitions that considered the Stoll et al or Gladman et al BILAG Index were more sensitive and at least equally specific.

Table 3. Diagnostic accuracy of the provisional PRINTO/ACR criteria for response to therapy under consideration of possible measure of global childhood-onset SLE disease activity*
PRINTO/ACR candidate improvement definitionSensitivity, %Specificity, %PPV, %NPV, %
  • *

    Values are the mean (95% confidence interval). PRINTO = Paediatric Rheumatology International Trials Organisation; ACR = American College of Rheumatology; SLE = systemic lupus erythematosus; PPV = positive predictive value; NPV = negative predictive value; ECLAM = European Consensus Lupus Activity Measure; SLEDAI = Systemic Lupus Erythematosus Disease Activity Index; SLAM = Systemic Lupus Activity Measure; BILAG = British Isles Lupus Activity Group Index.

ECLAM    
 a26 (17–35)92 (89–95)38 (26–50)86 (83–89)
 b26 (17–35)89 (86–92)32 (21–43)85 (82–88)
 c13 (6–20)97 (95–99)46 (26–66)85 (82–88)
 d28 (19–37)90 (87–93)37 (26–48)86 (83–89)
SLEDAI    
 a24 (15–33)92 (89–95)39 (26–52)86 (83–89)
 b25 (16–34)91 (88–94)35 (23–47)86 (83–89)
 c13 (6–20)97 (95–99)48 (28–68)85 (82–88)
 d27 (18–36)91 (88–94)39 (27–51)86 (83–89)
SLAM    
 a18 (10–26)92 (89–95)31 (18–44)85 (82–88)
 b20 (12–28)89 (86–92)28 (17–39)85 (82–88)
 c8 (2–14)97 (95–99)32 (12–52)84 (81–87)
 d22 (13–31)90 (87–93)32 (21–43)85 (82–88)
Gladman et al BILAG    
 a25 (16–34)92 (89–95)38 (25–51)86 (83–89)
 b28 (19–37)90 (87–93)37 (26–48)86 (83–89)
 c8 (2–14)97 (95–99)35 (14–56)84 (81–87)
 d29 (20–38)90 (87–93)37 (26–48)86 (83–89)
Liang et al BILAG    
 a11 (4–18)95 (93–97)31 (15–47)84 (81–87)
 b17 (9–25)93 (91–95)33 (19–47)85 (82–88)
 c6 (1–11)99 (98–100)45 (14–76)84 (81–87)
 d12 (5–19)93 (91–95)28 (14–42)84 (81–87)
Stoll et al BILAG    
 a27 (18–36)92 (89–95)39 (27–51)86 (83–89)
 b28 (19–37)90 (87–93)36 (25–47)86 (83–89)
 c9 (3–15)97 (95–99)36 (16–56)84 (81–87)
 d31 (21–41)90 (87–93)38 (27–49)87 (84–90)

Exploratory analyses.

When considering the family's perspective (patient change in health) as a criterion of whether improvement had occurred or not, the sensitivity of the 4 highest-ranked definitions did not exceed 23% and the PPV was <61%, whereas the specificity and the NPV remained high, at >88% and 61%, respectively. The choice of the measure of global disease activity index (SLAM, SLEDAI, ECLAM, or BILAG Index) did not have a major impact on the abovementioned measurement properties.

When only patients with biopsy-proven lupus nephritis were considered, depending on the index used in the core set to capture global disease activity, the sensitivity of all of the 4 highest-ranked candidate improvement definitions did not exceed 43% and the PPV did not exceed 56%, whereas specificity and the NPV remained high when using this validation data set, at ≥87% and ≥85%, respectively.

Alternative criteria of improvement using the childhood-onset SLE core set.

Using univariate logistic regression, individual core set variables were found to contribute to a different degree to the measurement of the construct improvement of childhood-onset SLE (Table 4). In multivariate logistic regression models that considered percentage changes of 5 or all 6 childhood-onset SLE core set variables as possible predictors (outcome of physician-rated improvement of childhood-onset SLE), we generated additional candidate criteria of childhood-onset SLE improvement with higher sensitivities and still acceptable specificities. The alternative candidate criteria that appeared to best lend themselves for potential use in clinical care and research, based on the face validity of the underlying algorithms, are shown in Table 4. When only patients with renal involvement were assessed (n = 39), then proteinuria contributed to a similar degree to the identification of patients who had improved (AUC = 0.51). When the family perspective of patient improvement was considered, then the AUC for the CHQ PHS was slightly higher, at 0.61. A simplified version (model M1S) using rounded regression coefficients from the multivariate model (model M1) showed an AUC of 0.82, supporting excellent accuracy for identifying patients who have improved. The AUCs for these alternative definitions are shown in Figure 2. From the score derived from each of the regression functions, the likelihood that improvement has occurred can be deduced. Using these somewhat more complex algorithms, candidate improvement definitions with sensitivities as high as 80% and equally high specificities can be derived.

Table 4. Area under the ROC curve of childhood-onset SLE core set variables for diagnosing physician-rated improvement*
Core set predictorsArea under the ROC curve (95% CI)
  • *

    ROC = receiver operating characteristic; SLE = systemic lupus erythematosus; 95% CI = 95% confidence interval; VAS = visual analog scale; SLEDAI = Systemic Lupus Erythematosus Disease Activity Index; ECLAM = European Consensus Lupus Activity Measure; BILAG = British Isles Lupus Activity Group Index; SLAM = Systemic Lupus Activity Measure; anti-dsDNA = anti–double-stranded DNA; CHQ PHS = Child Health Questionnaire physical summary score.

  • The value of the test to contribute to the correct identification of the outcome under consideration (here, improvement) can be assessed by the area under the ROC curve (range 0–1), whose values can be interpreted as follows: excellent = 0.90–1; good = 0.80–0.89; fair = 0.70–0.79; poor = 0.60–0.69; fail = 0.50–0.59.

  • Considering all patients, irrespective of childhood-onset SLE renal disease. The area under the curve was 0.51 when only children with SLE with biopsy-proven lupus nephritis were considered.

Univariate logistic regression models 
 Model U1: 19 + 7.5 × physician assessment VAS0.76 (0.70–0.83)
 Model U2: 18.3 + 2.6 × SLEDAI0.72 (0.66–0.79)
 Model U3: 18 + 5.4 × ECLAM0.69 (0.61–0.77)
 Model U4: 18.2 + 2 × Stoll et al BILAG0.68 (0.60–0.76)
 Model U5: 18 + 1 × Liang et al BILAG0.67 (0.59–0.75)
 Model U6: 17.4 + 2.9 × Gladman et al BILAG0.66 (0.58–0.74)
 Model U7: 17 + 0.4 × SLAM0.61 (0.53–0.70)
 Model U8: 16 + 1.5 × well-being VAS0.58 (0.51–0.66)
 Model U9: 15 + 0.7 × anti-dsDNA antibodies0.53 (0.45–0.62)
 Model U10: 16 − 0.2 × CHQ PHS0.53 (0.45–0.61)
 Model U11: 15 + 0.5 × proteinuria0.49 (0.42–0.57)
Multivariate models: combinations of core set variables 
 Model M1: 21 + 2.3 × SLEDAI + 8.2 × physician assessment VAS + 0.13 × CHQ PHS + 0.8 × anti-dsDNA antibodies − 1.4 × well-being VAS0.84 (0.78–0.90)
 Model M2: 21 + 2.3 × SLEDAI + 6.8 × physician assessment VAS − 0.01 × CHQ PHS − 0.9 × well-being VAS0.79 (0.73–0.86)
 Model M3: 18 + 3 × SLEDAI − 0.04 × CHQ PHS + 0.6 × anti-dsDNA antibodies − 1.5 × well-being VAS0.76 (0.69–0.83)
 Model M4: 21 + 2.4 × SLEDAI + 8.4 × physician assessment VAS + 0.08 × CHQ PHS + 0.7 × anti-dsDNA antibodies0.83 (0.77–0.89)
 Model M1S: 21 + 2.5 × SLEDAI + 8 × physician assessment VAS + 0.15 × CHQ PHS + 1 × anti-dsDNA antibodies − 1.5 × well-being VAS0.82 (0.77–0.88)
Figure 2.

Receiver operating characteristic (ROC) curves of predicting improvement using several core set variables in logistic regression models. The score (or predicted log odds) of improvement was calculated using the following multivariate logistic regression models: model M1 = 21 + 2.3 × Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) + 8.2 × physician assessment visual analog scale (VAS) + 0.13 × Child Health Questionnaire (CHQ) physical summary score (PHS) + 0.8 × anti–double-stranded DNA (anti-dsDNA) antibodies − 13.9 × well-being VAS; model M2 = 21 + 2.3 × SLEDAI + 6.8 × physician assessment VAS − 0.01 × CHQ PHS − 0.9 × well-being VAS; model M3 = 18 + 3 × SLEDAI − 0.04 × CHQ PHS + 0.6 × anti-dsDNA antibodies − 14.8 × well-being VAS; model M4 = 21 + 2.4 × SLEDAI + 8.4 × physician assessment VAS + 0.08 × CHQ PHS + 0.7 × anti-dsDNA antibodies. AUC = area under the ROC curve.

DISCUSSION

Validated response criteria allow investigators, clinicians, regulators, and patients to determine the efficacy (or lack thereof) of a given intervention and to communicate about response using the same metric. We undertook a prospective cohort study to validate the PRINTO/ACR provisional criteria for the evaluation of response to therapy. We confirm the high level of specificity, but the sensitivity of the criteria was much lower than previously reported. This was true for all 4 previously proposed highest-ranked candidate criteria. Based on our results, one might expect that, when used in the setting of a clinical trial, the PRINTO/ACR provisional criteria for the evaluation of response to therapy may underestimate the rate of responders. Therefore, larger sample sizes might be necessary to establish a significant difference between treatment arms than if more sensitive criteria were available.

Reasons why there was such a remarkable difference in sensitivity between our study and a previous study (6) are not completely known. They may include the fact that evaluations were done every 3 months and not every 6 months, and that the disease features of our patients were different from the previously studied multinational cohort. For example, our patients were somewhat older and had less severe disease at baseline, as is supported by lower disease activity scores, lower values on the physician assessment VAS, and higher values on the well-being VAS. However, given the recruitment approach taken and based on our research and the research of others, the patients included in this study can be considered representative for a contemporary childhood-onset SLE cohort in the US.

Another reason for the less favorable performance of the provisional criteria might be differences in the experimental design. Instead of patient profiles used during a consensus conference, this study's raters judged the course of an individual with childhood-onset SLE based on physical assessment and standard of care laboratory tests. Nonetheless, our results appear to be well in line with the observations of Ruperto et al, who found only a moderate agreement between the consensus ratings and the ratings of the managing pediatric rheumatologists (κ = 0.4, 95% confidence interval 0.2–0.6) (6). Given that the response criteria are to be used for the standardized assessment of real patients, we believe that it is critical to employ criteria that mirror the perceived disease course of true patients.

All of the raters who provided information on the course of disease activity (clinically important improvement or not), i.e., information about the external standard used for this validation exercise, are board-certified or board-eligible pediatric rheumatology professionals who see, on average, 20 patients with childhood-onset SLE per week in their academic center and have 10 years of experience in treating childhood-onset SLE. All of the raters underwent detailed and repeated training in scoring disease indices and completing the childhood-onset SLE core set.

The 6 childhood-onset SLE core response variables were developed by well-established consensus formation techniques (5). However, titers of anti-dsDNA antibodies are not considered when defining improvement using the PRINTO/ACR criteria. We revisited the usefulness of changes in anti-dsDNA antibodies and, in univariate analysis, found them to contribute to the identification of patient improvement to a larger degree than proteinuria in a cohort of childhood-onset SLE patients of whom 40% had biopsy-proven lupus nephritis. However, when considering changes in anti-dsDNA antibodies in the current algorithm used for the PRINTO/ACR provisional criteria for the evaluation of response to therapy as a sixth core variable, the sensitivity of the criteria did not improve importantly (data not shown).

The specific disease activity tool to measure global disease activity was not firmly chosen for the PRINTO/ACR provisional criteria for the evaluation of response to therapy. Therefore, we explored whether there was a preferred disease activity index that should be used. Similar to what has been suggested by Ruperto et al, differences in sensitivity and specificity were small (6). Although response criteria considering the BILAG Index (as a measure of global disease activity) may have a somewhat higher sensitivity than those considering the ECLAM, SLEDAI, or SLAM, this must be weighed against the complexity of the BILAG Index, which could result in a considerable measure error should the BILAG Index be scored by less experienced and trained raters.

We generated several alternative definitions of improvement that consider combinations of various core response variables using weightings derived by multivariate logistic regression modeling. Different from the current candidate definitions of response to therapy that treat each of the core set variables as equally important in prediction, the multivariate logistic models considered the different degree of contributions of each core set variable (via beta coefficients of the logistic model) to predict the outcome, i.e., improvement with childhood-onset SLE. Candidate criteria of improvement derived by multivariate logistic models provided better diagnostic accuracy in terms of the AUC, sensitivity, and specificity than any of the provisional candidate definitions proposed in the past. In candidate criteria derived by logistic regression, again, the choice of the disease activity measure was not important. The regression function for each of these alternative definitions yields a score that can be translated to a certain probability that improvement has occurred. The presented algorithms are similar to the one used to calculate absolute disease activity using the Disease Activity Score index (28). Although response criteria using absolute changes (and a regression formula) rather than percentage changes of core response variables are not commonly used in pediatric rheumatology at present, such criteria may actually be easier to use in clinical practice because the cumbersome calculation of percentage changes (as is done for other pediatric rheumatology response criteria) becomes unnecessary. Furthermore, difficulties of criteria based on percentage changes when assessing very active or very mild disease are circumvented.

This study must be seen in the light of certain limitations. Our data set was relatively small, and all of the patients were followed in the US by approximately 20 pediatric rheumatologists. Theoretically, these physicians might have judged the course of childhood-onset SLE differently than the average pediatric rheumatologist who is taking care of children with SLE. However, all of the participating rheumatologists were rather experienced and see patients with various ethnic and racial backgrounds, as is common in the US.

Additionally, response to therapy in our study was based on the physician's perception of the course of childhood-onset SLE rather than using data from a clinical trial. Clinical trial data from a large number of participants with childhood-onset SLE testing interventions that have an impact on disease activity are currently unavailable. Given their prospective character and the training that the investigators performed, we consider our data to be of as high quality as that collected for clinical trials. This is supported by the fact that the rate of missing data for the childhood-onset SLE core variables was less than 2%. Another limitation may be that we used the protein to creatinine ratio to estimate the degree of proteinuria, an approach that is disputed by some (17, 29).

The ACR has outlined a series of validation steps necessary before new criteria are to be widely used for clinical care or research (7). Therefore, additional validation of the newly proposed and previously suggested candidate definitions of improvement appears warranted. Furthermore, the assessment of the measurement properties of any definition of improvement must also include in the testing so-called “extreme phenotypes,” meaning in patients with either very active or relatively quiescent disease and/or with disease activity restricted to single organ systems or rarely involved organ systems. In view of the recent progress in identifying biomarkers of childhood-onset SLE global disease activity (30, 31), the inclusion of biomarker levels as alternative or additional childhood-onset SLE core set variable(s) in any definition of improvement may also deserve consideration.

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Brunner had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Brunner, Giannini.

Acquisition of data. Brunner, Higgins, Wiers, Lapidus, Olson, Onel, Punaro, Klein-Gitelman.

Analysis and interpretation of data. Brunner, Ying, Giannini.

Acknowledgements

We thank the following investigators for their assistance with data collection: Bob Colbert, T. Brent Graham, Murray Passo, Thomas Griffin, Alexi Grom, Daniel Lovell: Cincinnati Children's Hospital Medical Center, Cincinnati, OH; Robert Rennebohm: Nationwide Children's Hospital, Columbus, OH; Charles Spencer, Linda Wagner-Weiner: University of Chicago Comer Children's Hospital, Chicago, IL; Shirley Henry, PNP: Texas Scottish Rite Hospital, Dallas, TX; James Nocton, Calvin Williams, Elizabeth Roth-Wojicki, PNP: Medical College of Wisconsin and Children's Research Institute, Milwaukee, WI. We also acknowledge the assistance of the following: Shannen Nelson (study coordinating), Jamie Meyers-Eaton, Cynthia Rutherford (site coordinators): Cincinnati Children's Hospital Medical Center, Cincinnati, OH; Amber Khan, MD, Clinical Fellow (data entry), Lukasz Itert (database management): University of Cincinnati College of Medicine, Cincinnati, OH; Cincinnati Children's Hospital Medical Center Biomedical Informatics (Web-based data management application development); Becky Puplava (site coordinator): University of Chicago Comer Children's Hospital, Chicago, IL; Dina Blair (site coordinator): Children's Memorial Hospital, Chicago, IL; Marsha Malloy (data collection, site coordinator), Jeremy Zimmermann, Joshua Kapfhamer, Noshaba Khan (data collection): Medical College of Wisconsin and Children's Research Institute, Milwaukee, WI.

Ancillary