To prospectively validate the provisional criteria for the evaluation of response to therapy in children with systemic lupus erythematosus (SLE).
To prospectively validate the provisional criteria for the evaluation of response to therapy in children with systemic lupus erythematosus (SLE).
In this multicenter study, childhood-onset SLE patients (n = 98; 81 girls, 17 boys, 50% white, 88% non-Hispanic) were followed every 3 months for up to 7 visits (total number of visits 623). The 5 childhood-onset SLE core response variables were obtained at the time of each visit: 1) physician assessment of overall disease activity, 2) parent assessment of patient well-being, 3) Child Health Questionnaire, 4) proteinuria, and 5) global disease activity measure score, as measured by the European Consensus Lupus Activity Measure (ECLAM), the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI), or the Systemic Lupus Activity Measure (SLAM). Physician-rated relevant changes in the disease course (clinically relevant improvement, no change in disease, or worsening) between visits served as the criterion standard. Mixed models were used to assess the diagnostic accuracy of the 4 highest-ranked provisional definitions of response to therapy.
There were 89 episodes of clinically relevant improvement between 2 consecutive visits, and 448 episodes without improvement. Irrespective of the choice of the global disease activity measure (ECLAM, SLAM, SLEDAI), sensitivities of all 4 highest-ranked definitions were low (all ≤31%), whereas their specificities were excellent (all >88%). Using logistic models, alternative definitions can be developed with both 80% sensitivity and specificity.
The provisional criteria of response to therapy in childhood-onset SLE may have considerably lower sensitivity than previously reported. Additional validation in clinical trials is necessary to further evaluate the measurement properties of the provisional Paediatric Rheumatology International Trials Organisation/American College of Rheumatology criteria for response to therapy in children with SLE.
Systemic lupus erythematosus (SLE) is a complex, chronic, multisystem autoimmune inflammatory disease, and up to 20% of SLE patients are diagnosed during childhood, i.e., prior to the age of 16 years (1, 2). Compared with adults with SLE, patients with childhood-onset SLE more often have severe disease phenotypes, including a higher prevalence of kidney involvement (3).
Highly sensitive and specific surrogate markers are needed to serve as primary outcome measures of clinical trials of childhood-onset SLE that study the efficacy of novel medications. The lack of validated surrogate markers is considered a major barrier to the testing of safer and more effective therapies for childhood-onset SLE (4).
Using consensus methodology, the Paediatric Rheumatology International Trials Organisation (PRINTO)/American College of Rheumatology (ACR) provisional criteria for the evaluation of response to therapy for children with childhood-onset SLE were developed. Initial studies suggest that these criteria can measure response to therapy (or clinically relevant improvement) of individual patients with high sensitivity and specificity using an algorithm that considers percentage changes of 5 childhood-onset SLE core set parameters (5). Parameters include the score of an index of global disease activity, physician assessment of overall disease activity, parent assessment of patient overall well-being, proteinuria, and patient health-related quality of life (HRQOL) (6).
The Classification and Response Criteria Subcommittee of the ACR Committee on Quality Measures pointed out that validation of any outcome measures or response definitions is a dynamic process. Confirmatory studies are mandated to substantiate the usefulness of response criteria in other patient cohorts and by using different raters than those involved in the criteria development (7).
Therefore, we undertook a prospective cohort study to corroborate the measurement properties of the PRINTO/ACR provisional criteria for the evaluation of response to therapy. We specifically investigated their sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for identifying childhood-onset SLE patients who have experienced clinically relevant improvement.
Children (n = 98) fulfilling the ACR classification criteria for SLE (2) prior to the age of 16 years were consecutively recruited during routine clinic visits at 7 academic pediatric rheumatology centers in the US. Study visits occurred every 3 months for up to 18 months; each time, height, weight, findings on physical examination, and medication regimens were recorded, and disease activity and HRQOL were measured.
Details on the development and initial validation of the childhood-onset SLE core set parameters and the provisional definition of improvement are available elsewhere (8–10).
Briefly, changes in 5 childhood-onset SLE core parameters were used to define improvement with childhood-onset SLE: 1) physician assessment of overall disease activity, as measured on a visual analog scale (VAS) ranging from 0 to 10 (physician assessment VAS; where 0 = inactive disease and 10 = very active disease); 2) parent assessment of patient overall well-being, as measured on a VAS ranging from 0 to 10 (well-being VAS; where 0 = very poor and 10 = very well); 3) global disease activity, as measured by a validated disease activity index; 4) HRQOL, as measured by the Child Health Questionnaire (CHQ) physical summary score (PHS); and 5) renal involvement, as measured by daily proteinuria (5).
Consensus methodology and data-driven validation resulted in several proposed candidate criteria of improvement; the 4 highest-ranked criteria (a–d) were tested for this study in more detail: a = improvement of 2 of any 5 core variables by ≥50% without worsening of more than 1 by ≥30% and without increase in proteinuria; b = improvement of 2 of any 5 core variables by ≥40% without worsening of more than 1 by ≥30% without increase in proteinuria; c = improvement of 3 of any 5 core variables by ≥30% without worsening of more than 1 by ≥30% without increase in proteinuria; and d = improvement of 2 of any 5 core variables by ≥50% without worsening of more than 2 by ≥30% without increase in proteinuria.
It has been suggested that any of 3 indices of global disease activity can be used interchangeably as a core set parameter when measuring improvement: 1) the Systemic Lupus Assessment Measure (SLAM; range 0–81, where 0 = inactive disease) (11), 2) the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI; range 0–105, where 0 = inactive disease) (12), or 3) the European Consensus Lupus Assessment Measure (ECLAM) (13). Different from the SLAM and the SLEDAI, the global disease activity score of the ECLAM does not correspond to the sum of its item scores. Rounding procedures and special scoring rules for patients with single-organ involvement are considered, and this yields integer ECLAM summary scores between 0 and 10 (where 0 = inactive disease).
For the childhood-onset SLE core set, HRQOL was measured by the CHQ, a generic HRQOL inventory whose parent-completed version has been translated into numerous languages and culturally cross-validated for use in childhood-onset SLE (14–16). Two summary scores can be derived to measure psychosocial health and physical health (PHS). The CHQ PHS is the proposed measure of HRQOL to be considered in the childhood-onset SLE core set.
A timed urine collection has been suggested for the childhood-onset SLE core set to assess renal involvement (5). For this study, we measured the protein to creatinine ratio in a random urine sample instead. The rationale was that, in recent years, the protein to creatinine ratio has been proven to be an accurate approximation of daily protein excretion (17). The protein to creatinine ratio is now commonly used in clinics and even in clinical trials to measure proteinuria (18).
Values of the protein to creatinine ratio at ≥0.2 were considered abnormal, values of <0.2 were considered normal (19), and all smaller values were rounded up to 0.15. Any changes within the range of normal values were not considered as improved for the purposes of the childhood-onset SLE core set.
The British Isles Lupus Activity Group (BILAG) Index is another validated SLE disease index, but it has not been proposed for use in the childhood-onset SLE core set. For this study, the BILAG Index was completed as a potential alternative measure of disease activity (20, 21). For each of the 8 organ systems considered (general, mucocutaneous, neurologic, musculoskeletal, cardiovascular and respiratory, vasculitic, hematologic, and renal), an alphabetical domain score is obtained that can be converted to a numerical value by the use of 1 of 3 conversion schemes, as suggested by Gladman et al (A = 4, B = 3, C = 2, D = 1, E = 0) (22), Liang et al (A = 10, B = 6.7, C = 3.3, D or E = 0) (11), and Stoll et al (A = 9, B = 3, C = 1, D or E = 0) (23), with higher BILAG Index scores signifying higher disease activity. Global disease activity as measured by the BILAG Index is the sum of the numerical domain scores.
The original childhood-onset SLE core set comprises 6 core variables for disease activity (5). In addition to the 5 childhood-onset SLE core variables considered in the childhood-onset SLE criteria for response to therapy, there is a parameter to represent immunologic disease activity. This is achieved by measuring levels of anti–double-stranded DNA (anti-dsDNA) antibodies. For this study, anti-dsDNA antibodies were measured by the investigators as part of the standard of care, using various laboratory assays. To be considered as improved in this study, anti-dsDNA antibodies had to decrease by a certain percentage, plus either be newly within the normal range (previous visit abnormal) or remain above the upper bounds for normal (stay abnormal). Further decreases of values of anti-dsDNA antibodies that were already in the range of normal were not considered as improved.
In response to the sentence stem, “Compared to the last study visit 3 months ago and the patient's overall disease, the patient experienced a …,” the managing pediatric rheumatologist rated the change in disease course on a 5-point Likert scale as follows: major flare of disease, minor flare of disease, no change in disease, minor improvement of disease, or major improvement of disease.
In exploratory analysis, we assessed how the provisional criteria of improvement would reflect the family's perspective. Therefore, the parent rated the change of their child's disease on a 5-point Likert scale (much worse, somewhat worse, unchanged, somewhat improved, or much improved) that was presented with the sentence stem, “Compared to the last study visit 3 months ago, and when considering medications, school, work, life at home, doctor visits, pains, and feelings, the overall well-being is. …”
Numerical variables were summarized by the mean ± SEM or SD; categorical variables were summarized by the frequency (in percentage).
Numerical core set variables were assessed for their associations with the PRINTO/ACR improvement criteria (a dichotomous variable of improved versus unimproved or unchanged) using mixed-effect models that adjusted for patient demographic and baseline clinical characteristics. A random effect was used to account for within-patient correlation caused by repeated measurements. In order to predict the likelihood of improvement as per the PRINTO/ACR criteria (a dichotomized dependent variable) using core set variables, both univariate and multivariate logistic regression models were applied. A generalized estimating equation method was used in the logistic regression models to account for within-patient correlation. In the univariate logistic regression models, each of the core set variables was considered the only predictor at the time, whereas in the multivariate logistic regression models, all of the core set variables were included as the predictors of interest. In order to assess whether the predicted PRINTO/ACR improvement was sensitive to the choice of different global disease activity indices, a series of multivariate models was generated, using one of the global disease activity indices at a time. Contributions of other core set variables were also assessed in the competing multivariate models by deleting one of these core set variables at a time. The predicted log odds (or scores) of improvement from logistic regression models were further used to assess their diagnostic accuracy using the receiver operating characteristic curve and the area under the curve (AUC), sensitivity, and specificity, respectively. Sensitivity, specificity, PPV, and NPV were also used to evaluate the diagnostic accuracy of the 4 highest-ranked PRINTO/ACR definitions of childhood-onset SLE response to therapy. Confidence intervals of the AUC were estimated using a bootstrap method, with a total of 2,500 replicates for each model (24, 25).
Statistical computations were performed using SAS software, version 9.2 (SAS, Cary, NC). P values less than 0.05 were considered statistically significant. Diagnostic accuracy was considered excellent, outstanding, good, fair, or poor if the AUC was in the range of 0.91–1.0, 0.81–0.90, 0.71–0.80, 0.61–0.70, or 0.50–0.60, respectively (26).
This study was approved by the institutional review boards of the participating pediatric rheumatology centers. Informed consent was obtained from all of the parents and, as appropriate, assent was given by the participants prior to the study procedures.
The demographics and disease features of the patients with childhood-onset SLE are shown in Table 1. A total of 98 children (81 girls, 17 boys) were included in the analysis. The population consisted of 49 white, 32 African American, 3 Asian, and 3 mixed-race patients (87 non-Hispanics, 11 Hispanics). Data from 623 visits (or 526 between-visit intervals) were available for analysis. There were 39 patients with biopsy-proven lupus nephritis. The mean ± SD damage as measured by the Systemic Lupus International Collaborating Clinics/ACR Damage Index (27) was 0.42 ± 0.1. The global disease course with childhood-onset SLE on consecutive visits during the study is depicted in Figure 1. There were 35 renal flares (major or minor) and 42 episodes of renal improvement (major or minor).
|Parameter||N||Mean ± SD|
|Age, years||98||15.3 ± 2.85|
|Disease duration, years||98||1.5 ± 2.0|
|Current medications||15.1 ± 1.8|
|Azathioprine, mycophenolate mofetil, methotrexate||47|
|Nonsteroidal antiinflammatory drugs||24|
|At least 1 antihypertensive medication||38|
|Biopsy-proven lupus nephritis†||39|
|Proteinuria‡||98||0.44 ± 0.96|
|Disease damage, SDI score (0 = no damage)||98||0.42 ± 0.1|
|Physician assessment of overall disease activity (VAS)||98||2.5 ± 1.95|
|Child Health Questionnaire physical function score||98||42.4 ± 12.14|
|SLAM||7.64 ± 6.01|
|SLEDAI||5.18 ± 4.35|
|Stoll et al BILAG||5.31 ± 5.44|
|Liang et al BILAG||11.8 ± 8.81|
|Gladman et al BILAG||8.7 ± 3.57|
The mean changes of the core set parameters by disease course (unimproved or flare and unchanged combined versus improved) as rated by the managing pediatric rheumatologist are shown in Table 2. Compared with patients who were not improved, the core variables of patients rated as improved significantly changed, with the exception of proteinuria and the CHQ PHS.
|Unimproved (n = 448 episodes)||Improved (n = 89 episodes)||P†|
|Physician assessment VAS||0.11 ± 0.06||−1.01 ± 0.14‡||< 0.001|
|Well-being VAS||−0.01 ± 0.09||0.54 ± 0.20§||< 0.05|
|Proteinuria||0.03 ± 0.08||−0.08 ± 0.17||NS|
|ECLAM||0.22 ± 0.07§||−1.12 ± 0.16‡||< 0.001|
|SLEDAI||0.25 ± 0.15||−2.27 ± 0.34‡||< 0.001|
|SLAM||0.36 ± 0.34||−1.78 ± 0.79¶||< 0.05|
|Liang et al BILAG||0.16 ± 0.32||−4.61 ± 0.72‡||< 0.001|
|Gladman et al BILAG||0.18 ± 0.11||−1.22 ± 0.24‡||< 0.001|
|Stoll et al BILAG||0.10 ± 0.18||−2.79 ± 0.40‡||< 0.001|
|CHQ PHS||−0.09 ± 0.51||1.94 ± 1.13||NS|
When only children with biopsy-proven lupus nephritis were considered, then the change in the mean ± SEM of proteinuria was −0.1 ± 0.4 and −0.005 ± 0.17 for improved and unimproved courses, respectively (P = 0.83).
Irrespective of the index used in the core set to measure global disease activity and for all candidate criteria assessed, the sensitivity did not exceed 31%, and the PPV did not exceed 48% (Table 3). However, specificity and the NPV remained high when using this validation data set, at ≥89% and ≥84%, respectively. Of note, some candidate definitions that considered the Stoll et al or Gladman et al BILAG Index were more sensitive and at least equally specific.
|PRINTO/ACR candidate improvement definition||Sensitivity, %||Specificity, %||PPV, %||NPV, %|
|a||26 (17–35)||92 (89–95)||38 (26–50)||86 (83–89)|
|b||26 (17–35)||89 (86–92)||32 (21–43)||85 (82–88)|
|c||13 (6–20)||97 (95–99)||46 (26–66)||85 (82–88)|
|d||28 (19–37)||90 (87–93)||37 (26–48)||86 (83–89)|
|a||24 (15–33)||92 (89–95)||39 (26–52)||86 (83–89)|
|b||25 (16–34)||91 (88–94)||35 (23–47)||86 (83–89)|
|c||13 (6–20)||97 (95–99)||48 (28–68)||85 (82–88)|
|d||27 (18–36)||91 (88–94)||39 (27–51)||86 (83–89)|
|a||18 (10–26)||92 (89–95)||31 (18–44)||85 (82–88)|
|b||20 (12–28)||89 (86–92)||28 (17–39)||85 (82–88)|
|c||8 (2–14)||97 (95–99)||32 (12–52)||84 (81–87)|
|d||22 (13–31)||90 (87–93)||32 (21–43)||85 (82–88)|
|Gladman et al BILAG|
|a||25 (16–34)||92 (89–95)||38 (25–51)||86 (83–89)|
|b||28 (19–37)||90 (87–93)||37 (26–48)||86 (83–89)|
|c||8 (2–14)||97 (95–99)||35 (14–56)||84 (81–87)|
|d||29 (20–38)||90 (87–93)||37 (26–48)||86 (83–89)|
|Liang et al BILAG|
|a||11 (4–18)||95 (93–97)||31 (15–47)||84 (81–87)|
|b||17 (9–25)||93 (91–95)||33 (19–47)||85 (82–88)|
|c||6 (1–11)||99 (98–100)||45 (14–76)||84 (81–87)|
|d||12 (5–19)||93 (91–95)||28 (14–42)||84 (81–87)|
|Stoll et al BILAG|
|a||27 (18–36)||92 (89–95)||39 (27–51)||86 (83–89)|
|b||28 (19–37)||90 (87–93)||36 (25–47)||86 (83–89)|
|c||9 (3–15)||97 (95–99)||36 (16–56)||84 (81–87)|
|d||31 (21–41)||90 (87–93)||38 (27–49)||87 (84–90)|
When considering the family's perspective (patient change in health) as a criterion of whether improvement had occurred or not, the sensitivity of the 4 highest-ranked definitions did not exceed 23% and the PPV was <61%, whereas the specificity and the NPV remained high, at >88% and 61%, respectively. The choice of the measure of global disease activity index (SLAM, SLEDAI, ECLAM, or BILAG Index) did not have a major impact on the abovementioned measurement properties.
When only patients with biopsy-proven lupus nephritis were considered, depending on the index used in the core set to capture global disease activity, the sensitivity of all of the 4 highest-ranked candidate improvement definitions did not exceed 43% and the PPV did not exceed 56%, whereas specificity and the NPV remained high when using this validation data set, at ≥87% and ≥85%, respectively.
Using univariate logistic regression, individual core set variables were found to contribute to a different degree to the measurement of the construct improvement of childhood-onset SLE (Table 4). In multivariate logistic regression models that considered percentage changes of 5 or all 6 childhood-onset SLE core set variables as possible predictors (outcome of physician-rated improvement of childhood-onset SLE), we generated additional candidate criteria of childhood-onset SLE improvement with higher sensitivities and still acceptable specificities. The alternative candidate criteria that appeared to best lend themselves for potential use in clinical care and research, based on the face validity of the underlying algorithms, are shown in Table 4. When only patients with renal involvement were assessed (n = 39), then proteinuria contributed to a similar degree to the identification of patients who had improved (AUC = 0.51). When the family perspective of patient improvement was considered, then the AUC for the CHQ PHS was slightly higher, at 0.61. A simplified version (model M1S) using rounded regression coefficients from the multivariate model (model M1) showed an AUC of 0.82, supporting excellent accuracy for identifying patients who have improved. The AUCs for these alternative definitions are shown in Figure 2. From the score derived from each of the regression functions, the likelihood that improvement has occurred can be deduced. Using these somewhat more complex algorithms, candidate improvement definitions with sensitivities as high as 80% and equally high specificities can be derived.
|Core set predictors||Area under the ROC curve (95% CI)†|
|Univariate logistic regression models|
|Model U1: 19 + 7.5 × physician assessment VAS||0.76 (0.70–0.83)|
|Model U2: 18.3 + 2.6 × SLEDAI||0.72 (0.66–0.79)|
|Model U3: 18 + 5.4 × ECLAM||0.69 (0.61–0.77)|
|Model U4: 18.2 + 2 × Stoll et al BILAG||0.68 (0.60–0.76)|
|Model U5: 18 + 1 × Liang et al BILAG||0.67 (0.59–0.75)|
|Model U6: 17.4 + 2.9 × Gladman et al BILAG||0.66 (0.58–0.74)|
|Model U7: 17 + 0.4 × SLAM||0.61 (0.53–0.70)|
|Model U8: 16 + 1.5 × well-being VAS||0.58 (0.51–0.66)|
|Model U9: 15 + 0.7 × anti-dsDNA antibodies||0.53 (0.45–0.62)|
|Model U10: 16 − 0.2 × CHQ PHS||0.53 (0.45–0.61)|
|Model U11: 15 + 0.5 × proteinuria||0.49 (0.42–0.57)‡|
|Multivariate models: combinations of core set variables|
|Model M1: 21 + 2.3 × SLEDAI + 8.2 × physician assessment VAS + 0.13 × CHQ PHS + 0.8 × anti-dsDNA antibodies − 1.4 × well-being VAS||0.84 (0.78–0.90)|
|Model M2: 21 + 2.3 × SLEDAI + 6.8 × physician assessment VAS − 0.01 × CHQ PHS − 0.9 × well-being VAS||0.79 (0.73–0.86)|
|Model M3: 18 + 3 × SLEDAI − 0.04 × CHQ PHS + 0.6 × anti-dsDNA antibodies − 1.5 × well-being VAS||0.76 (0.69–0.83)|
|Model M4: 21 + 2.4 × SLEDAI + 8.4 × physician assessment VAS + 0.08 × CHQ PHS + 0.7 × anti-dsDNA antibodies||0.83 (0.77–0.89)|
|Model M1S: 21 + 2.5 × SLEDAI + 8 × physician assessment VAS + 0.15 × CHQ PHS + 1 × anti-dsDNA antibodies − 1.5 × well-being VAS||0.82 (0.77–0.88)|
Validated response criteria allow investigators, clinicians, regulators, and patients to determine the efficacy (or lack thereof) of a given intervention and to communicate about response using the same metric. We undertook a prospective cohort study to validate the PRINTO/ACR provisional criteria for the evaluation of response to therapy. We confirm the high level of specificity, but the sensitivity of the criteria was much lower than previously reported. This was true for all 4 previously proposed highest-ranked candidate criteria. Based on our results, one might expect that, when used in the setting of a clinical trial, the PRINTO/ACR provisional criteria for the evaluation of response to therapy may underestimate the rate of responders. Therefore, larger sample sizes might be necessary to establish a significant difference between treatment arms than if more sensitive criteria were available.
Reasons why there was such a remarkable difference in sensitivity between our study and a previous study (6) are not completely known. They may include the fact that evaluations were done every 3 months and not every 6 months, and that the disease features of our patients were different from the previously studied multinational cohort. For example, our patients were somewhat older and had less severe disease at baseline, as is supported by lower disease activity scores, lower values on the physician assessment VAS, and higher values on the well-being VAS. However, given the recruitment approach taken and based on our research and the research of others, the patients included in this study can be considered representative for a contemporary childhood-onset SLE cohort in the US.
Another reason for the less favorable performance of the provisional criteria might be differences in the experimental design. Instead of patient profiles used during a consensus conference, this study's raters judged the course of an individual with childhood-onset SLE based on physical assessment and standard of care laboratory tests. Nonetheless, our results appear to be well in line with the observations of Ruperto et al, who found only a moderate agreement between the consensus ratings and the ratings of the managing pediatric rheumatologists (κ = 0.4, 95% confidence interval 0.2–0.6) (6). Given that the response criteria are to be used for the standardized assessment of real patients, we believe that it is critical to employ criteria that mirror the perceived disease course of true patients.
All of the raters who provided information on the course of disease activity (clinically important improvement or not), i.e., information about the external standard used for this validation exercise, are board-certified or board-eligible pediatric rheumatology professionals who see, on average, 20 patients with childhood-onset SLE per week in their academic center and have 10 years of experience in treating childhood-onset SLE. All of the raters underwent detailed and repeated training in scoring disease indices and completing the childhood-onset SLE core set.
The 6 childhood-onset SLE core response variables were developed by well-established consensus formation techniques (5). However, titers of anti-dsDNA antibodies are not considered when defining improvement using the PRINTO/ACR criteria. We revisited the usefulness of changes in anti-dsDNA antibodies and, in univariate analysis, found them to contribute to the identification of patient improvement to a larger degree than proteinuria in a cohort of childhood-onset SLE patients of whom 40% had biopsy-proven lupus nephritis. However, when considering changes in anti-dsDNA antibodies in the current algorithm used for the PRINTO/ACR provisional criteria for the evaluation of response to therapy as a sixth core variable, the sensitivity of the criteria did not improve importantly (data not shown).
The specific disease activity tool to measure global disease activity was not firmly chosen for the PRINTO/ACR provisional criteria for the evaluation of response to therapy. Therefore, we explored whether there was a preferred disease activity index that should be used. Similar to what has been suggested by Ruperto et al, differences in sensitivity and specificity were small (6). Although response criteria considering the BILAG Index (as a measure of global disease activity) may have a somewhat higher sensitivity than those considering the ECLAM, SLEDAI, or SLAM, this must be weighed against the complexity of the BILAG Index, which could result in a considerable measure error should the BILAG Index be scored by less experienced and trained raters.
We generated several alternative definitions of improvement that consider combinations of various core response variables using weightings derived by multivariate logistic regression modeling. Different from the current candidate definitions of response to therapy that treat each of the core set variables as equally important in prediction, the multivariate logistic models considered the different degree of contributions of each core set variable (via beta coefficients of the logistic model) to predict the outcome, i.e., improvement with childhood-onset SLE. Candidate criteria of improvement derived by multivariate logistic models provided better diagnostic accuracy in terms of the AUC, sensitivity, and specificity than any of the provisional candidate definitions proposed in the past. In candidate criteria derived by logistic regression, again, the choice of the disease activity measure was not important. The regression function for each of these alternative definitions yields a score that can be translated to a certain probability that improvement has occurred. The presented algorithms are similar to the one used to calculate absolute disease activity using the Disease Activity Score index (28). Although response criteria using absolute changes (and a regression formula) rather than percentage changes of core response variables are not commonly used in pediatric rheumatology at present, such criteria may actually be easier to use in clinical practice because the cumbersome calculation of percentage changes (as is done for other pediatric rheumatology response criteria) becomes unnecessary. Furthermore, difficulties of criteria based on percentage changes when assessing very active or very mild disease are circumvented.
This study must be seen in the light of certain limitations. Our data set was relatively small, and all of the patients were followed in the US by approximately 20 pediatric rheumatologists. Theoretically, these physicians might have judged the course of childhood-onset SLE differently than the average pediatric rheumatologist who is taking care of children with SLE. However, all of the participating rheumatologists were rather experienced and see patients with various ethnic and racial backgrounds, as is common in the US.
Additionally, response to therapy in our study was based on the physician's perception of the course of childhood-onset SLE rather than using data from a clinical trial. Clinical trial data from a large number of participants with childhood-onset SLE testing interventions that have an impact on disease activity are currently unavailable. Given their prospective character and the training that the investigators performed, we consider our data to be of as high quality as that collected for clinical trials. This is supported by the fact that the rate of missing data for the childhood-onset SLE core variables was less than 2%. Another limitation may be that we used the protein to creatinine ratio to estimate the degree of proteinuria, an approach that is disputed by some (17, 29).
The ACR has outlined a series of validation steps necessary before new criteria are to be widely used for clinical care or research (7). Therefore, additional validation of the newly proposed and previously suggested candidate definitions of improvement appears warranted. Furthermore, the assessment of the measurement properties of any definition of improvement must also include in the testing so-called “extreme phenotypes,” meaning in patients with either very active or relatively quiescent disease and/or with disease activity restricted to single organ systems or rarely involved organ systems. In view of the recent progress in identifying biomarkers of childhood-onset SLE global disease activity (30, 31), the inclusion of biomarker levels as alternative or additional childhood-onset SLE core set variable(s) in any definition of improvement may also deserve consideration.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Brunner had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Brunner, Giannini.
Acquisition of data. Brunner, Higgins, Wiers, Lapidus, Olson, Onel, Punaro, Klein-Gitelman.
Analysis and interpretation of data. Brunner, Ying, Giannini.
We thank the following investigators for their assistance with data collection: Bob Colbert, T. Brent Graham, Murray Passo, Thomas Griffin, Alexi Grom, Daniel Lovell: Cincinnati Children's Hospital Medical Center, Cincinnati, OH; Robert Rennebohm: Nationwide Children's Hospital, Columbus, OH; Charles Spencer, Linda Wagner-Weiner: University of Chicago Comer Children's Hospital, Chicago, IL; Shirley Henry, PNP: Texas Scottish Rite Hospital, Dallas, TX; James Nocton, Calvin Williams, Elizabeth Roth-Wojicki, PNP: Medical College of Wisconsin and Children's Research Institute, Milwaukee, WI. We also acknowledge the assistance of the following: Shannen Nelson (study coordinating), Jamie Meyers-Eaton, Cynthia Rutherford (site coordinators): Cincinnati Children's Hospital Medical Center, Cincinnati, OH; Amber Khan, MD, Clinical Fellow (data entry), Lukasz Itert (database management): University of Cincinnati College of Medicine, Cincinnati, OH; Cincinnati Children's Hospital Medical Center Biomedical Informatics (Web-based data management application development); Becky Puplava (site coordinator): University of Chicago Comer Children's Hospital, Chicago, IL; Dina Blair (site coordinator): Children's Memorial Hospital, Chicago, IL; Marsha Malloy (data collection, site coordinator), Jeremy Zimmermann, Joshua Kapfhamer, Noshaba Khan (data collection): Medical College of Wisconsin and Children's Research Institute, Milwaukee, WI.