Clinical important improvement of chronic pain patients in randomized controlled trials and the DATAPAIN cohort

Abstract Background Change on the Numeric Rating Scale (NRS) is based on subjective pain experience, hampering the establishment of clinically important improvement. An anchor‐based method, the Patients’ Global Impression of Change (PGIC), is often added to determine whether a patient has improved. A two‐point change on the NRS has been shown to be equivalent to a moderate clinically important improvement in randomized controlled trials (RCT’s) on medication effects. We contemplated whether these findings could be reproduced in cohort and data and in non‐drug interventional RCT’s. Methods The NRS change was quantified by subtracting the NRS of baseline from the NRS at 6‐month follow‐up. Categorization of success/nonsuccess was applied on the PGIC, and their average NRS raw changes were calculated. The Spearman correlation coefficient quantified the overall relationship, while the discriminative ability was explored through the receiver operating characteristic curve. Data were stratified on design, sex, and pain intensity at baseline. Besides, the cohort evaluated treatment status at follow‐up. Results The records of 1661 patients were examined. Overall, the observed NRS change needed for moderate clinically important improvement was larger than the average two points. Yet, the changes in the cohort were smaller compared with the RCT’s. Moreover, it modified with pain intensity at baseline and treatment statuses indicated differences in mean clinical importance of −4.15 (2.70) when finalized at 6 months and −2.16 (2.48) when treatment was ongoing. Conclusion The moderate clinically important improvement varied substantially, representing heterogeneity in pain relief and its relation to treatment success in chronic pain patients.


I N T RODUC T ION
Many chronic pain patients do not obtain adequate or equivalent pain relief from existing interventions. Due to the highly subjective nature of pain, the meaningfulness of changes in pain is often difficult to interpret. 1 Literature indicates a higher prevalence and average of pain intensity in women when compared to men. 2,3 Also, women seem to report greater functional limitations for the same pain intensity. 4 Yet, it is rather unclear whether differences in meaningfulness of changes in pain are existent between sexes. The clinically important improvement in measurement scores is a critical consideration when evaluating treatment effect based on patientreported outcome measures (PROMs). 5 The IMMPACT recommendations list the 11-point Numeric Rating Scale (NRS) to quantify pain intensity and to summarize the subjective interpretation of the pain experienced. An essential step in clinical research is to determine the statistical significance and confidence intervals of the change in measurements scores within or between groups, as these reflect on the magnitude, variability of treatment effect, and sample size. 5,6 To determine the clinically important improvement, an anchor-based method can be applied by relying on a global item completed by the patient, such as the Patients Global Impression of Change (PGIC). The PGIC does not primarily measure pain relief, but evaluates the overall improvement of the pain treatment. By anchoring these two measures, the relationship between pain relief and overall improvement can be examined from a patient point of view. [5][6][7][8] Several studies have quantified the clinically important improvement for several core domains in the chronic pain population. 1,5,6,[8][9][10] The landmark paper of Farrar et al. assessed the clinical importance of pain relief. This was based only on data from randomized controlled trials (RCT's) on the effect of pregabalin treatment. 6 Ever since, an average change from baseline of two points or a 30% change on the NRS has often been regarded as a moderate clinically important improvement, that is equivalent to the categories "very much improved" and "much improved" on the PGIC. 6 However, the chronic pain population is heterogeneous in pain relief and burden, and it is unclear whether these findings can be generalized toward patients included in interventional studies that analyze the effect of non-drug treatments (eg, spinal cord stimulation and intradiscal injection), cohort data or subgroups of the chronic pain population. The aim of this study was to assess the generalizability of the aforementioned definition of the clinically important improvement on the NRS to non-drug interventional RCT's and heterogeneous cohort data. We expected to find a two-point difference on average on the NRS as a moderate clinically important improvement in both interventional studies of nondrug treatments and cohort data. However, we expected to find differences between subgroups of chronic pain populations, such as a larger difference than two points on average for women when compared to men. These results may contribute to the design of future studies, inform sample size calculations, and may set specific criteria for cohort studies. Such information will facilitate comparison of the results across studies and set the value for therapeutic meaningfulness in clinical practice.

M ET HOD S
For more than 20 years, the Department of Anaesthesiology and Pain Medicine of the Maastricht University Medical Center+in the Netherlands has routinely collected both the NRS and PGIC of chronic pain patients for both interventional studies and a cohort data. For this study, we used data of the RCT's of van Eerd, et al., 11 Kallewaard et al., 12 Slangen, et al., 13 and Kemler et al. 14 and cohort data of the DATAPAIN cohort, initiated in 2003 by the Comprehensive Multidisciplinary University Pain Center Maastricht. 2 To perform this secondary analysis, approval was obtained by the medical ethical committee of the Maastricht University (METC approval number: 2020-2391). All studies included followed the recommendations of the IMMPACT guidelines on core outcomes for an adequate evaluation of the treatment efficacy and effectiveness. 5,15 For this secondary analysis, patients were included if they were 18 years or older, had been experiencing pain for more than 3 months, and had completed both the NRS and the PGIC at 6-month follow-up.

Measurements
The 11-point NRS was used to quantify pain intensity ranging from zero (no pain) to ten (the most pain imaginable). 16 In the RCT's, the average momentary NRS was computed from a 4-day diary at baseline and 6-month follow-up. In the cohort, the average NRS of the past week was collected at both measurement moments using a single item.

Key points
• Patients reported clinically important improvement at a larger pain relief than the average decrease of 2 points on the NRS. • The amount of pain relief needed differed substantially between cohort and RCT patients. • Pain severity at baseline modified the amount of pain relief needed for a clinically important improvement.
The PGIC was used to collect the status of the patient's global impression of change on a 7-point Likert scale ranging from "very much improved" to "very much worse". In addition, a dichotomous PGIC score was computed in which "very much improved" and "much improved" indicated a successful treatment outcome and "minimally improved," "no change," "minimally worse," "much worse," and"very much worse," a nonsuccessful outcome. The PGIC was used as an anchorbased criterion to distinguish between successful and non-successful treatment at follow-up. 5 Stratification Because the literature suggests analyzing both sexes separately as there may be different values for clinically important improvement on treatment outcome, 5 we stratified on sex in addition to study design (ie, cohort and RCT). Moreover, baseline NRS scores were cutoff into 3 different pain categories: mild, with a pain intensity of 0-5 on the NRS; modest, with a 5-7 on the NRS; and severe, with a 7-10 on the NRS. 17 In the DATAPAIN cohort, patients could have been treated for more than 6 months (the follow-up time used to compute change from baseline) due to receiving more than one treatment or elongation of a specific treatment. Therefore, the cohort was stratified on treatment status at follow-up; completed; or ongoing.

Statistical analysis
Study-level characteristics (age, sex, and NRS scores) were described as means and standard deviations (SD) or percentages. The NRS change was quantified by subtracting the baseline NRS from the follow-up NRS and described as a mean difference with 95% confidence interval (CI). To test within-group changes, the paired sample t-test was applied.
Average raw and relative changes in the NRS were calculated for each of the 7 outcome categories of the PGIC, and the Spearman correlation coefficient was calculated to quantify this relationship. Furthermore on the PGIC, patients were classified into treatment success or nonsuccess and the respective NRS changes were calculated. To assess the discriminative ability of the NRS for treatment success, the area under the curve receiver operating characteristic curve was computed, or AUC, with 95% CI. The AUC can range between 0.5 (no discriminative ability) and 1.0 (perfect discriminative ability).
Subsequently, the data were stratified and analyses were repeated for the study designs (RCT and cohort), sex categories, baseline NRS categories, and treatment status in the cohort data. The statistical analyses were executed in R, a language for statistical computing, version 3.6.1. Table 1 summarizes the study characteristics and baseline variables of the cohort and RCT's. In case of the latter, the variables were presented for each separately and all RCT's combined. The results of the Spearman correlations and ROC curve analyses are summarized in Table 2. In total, the records of 1661 chronic pain patients were examined. In this study, an average raw NRS change of −3.58 (SD: 1.89) was associated with a clinically important improvement as defined on the PGIC. This average was much higher than expected based on the two points or more raw change found in the literature. Moreover, the NRS change required in the cohort data differed from the RCT's.

Cohort versus RCT's
When comparing the cohort data with the RCT's, the average NRS changes differed for each of the PGIC categories ( Figure 1). Similarly, the success/nonsuccess PGIC differed in average raw NRS changes, with averages of −3.33 (SD: 2.72) for the cohort and −4.56 (SD: 1.85) for the RCT's (p < 0.001). Hence, cohort participants reported treatment success at smaller changes in pain relief, on average. Moreover, the percentage of treatment success diverged considerably between the cohort with 316 (22.19%) patients and 81 (34.18%) in the RCT's (p < 0.001).

Stratification on sex
In both the cohort and RCT's, the stratification on sex resulted in different mean values of clinically important improvement in the"very much improved" categories of the PGIC (Figures 2 and 3). Women indicated to need, on average, one point more in NRS change to label their improvement as "very much improved." Nonetheless, the average NRS changes for treatment success did not differ between the sexes, in the DATAPAIN cohort;

DATAPAIN cohort; stratification on treatment status
In the cohort, the treatment duration resulted in considerably different NRS changes between baseline and follow-up. The average raw NRS change, for the completed treatment group, was −2.13 (95% CI: −2.39, −1.86) and −0.79 (95% CI: −0.95, −0.63) for the ongoing treatment group (p < 0.001) (Figure 4). Besides, the percentage of treatment success differed significantly, for those who had completed treatment (151 patients, 36.3%) and those ongoing in treatment (72 patients, 13.85%), p < 0.001. To report clinically important improvement, an average NRS change of −4.15 (SD: 2.70) was needed for the completed group and a −2.16 (SD: 2.48) for the ongoing group. Thus, the ongoing treatment group reported to experience treatment success at a much lower average NRS change compared with those with completed treatment.

Stratification on NRS baseline score
Differences in pain severity at baseline were more prominent in the cohort data than in the RCT's ( Figures 5 and 6).

DI SC US SION
The main objective of this study was to assess whether an average decrease of two points on the NRS was found to be clinically important, in chronic pain patients who participated in a large observational cohort and in patients included in RCT's of non-pharmaceutical treatments. In this study, patients reported clinically important improvement at a larger pain relief than the average decrease of two points on the NRS. The amount of pain relief needed differed substantially between study designs. In both cohort and RCT's, pain severity at baseline modified the amount of pain relief needed for a clinically important improvement. Furthermore within the cohort, considerable differences were found in treatment status.
In the RCT's, strict inclusion criteria were applied before administering a single intervention, while personalized care was provided to every patient in the cohort, as these patients received care of a multidisciplinary pain team. Therefore, cohort patients may have had more than one intervention or have been treated intermittently, leading to patients in treatment at 6-month follow-up. We believed that this had an influence on the NRS change and the value given to the clinical importance of the improvement. Stratification revealed that the change in pain relief was reduced significantly when observing clinically important improvement for those still being treated. Illustrating that patients in treatment at 6 months were satisfied with their treatment progress at a much lower rate in pain relief. Notwithstanding, the association between the NRS change and clinically important improvement decreased substantially in the AUC, specifying that for these patients other underlying factors may play an important role when answering the PGIC. Therefore, the need for further investigation on factors that contribute in answering the PGIC is warranted.
The stratification on baseline NRS indicated that NRS change is non-uniform across these groups. 6,8 Care should be taken in the comparison with patients that initiate at different pain intensities at baseline. This may be due to certain expectancy of improvement, hence further research on this specific topic may clarify the motive. The differences in the RCT's were not all significant, plausibly due to the low amount of patients included in the mild NRS baseline group, as per inclusion criteria, F I G U R E 2 Average NRS change score for the PGIC category in men and women of the cohort and hence, low statistical power may have caused us to miss differences.
Stratification on sex revealed no differences at the two-point raw change representing the two successful categories of the PGIC. Furthermore, a patient characteristic that may play a role as well is age. We observed on the cross-sectional level in our tertiary pain population patients of older age have a lower average pain intensity at presentation. 3 Yet, a longitudinal study on birth cohorts indicate that there is a positive relationship between age and pain intensity within patients over time. 18 Socio-demographic variables that have a negative association with pain are education, employment, and wealth. 3,18 Yet, the question remains if these factors have influence on the outcome of the PGIC and are recommended to be further analyzed in future studies.
The results of this study show different outcomes when compared to the results of Farrar et al; this may be due to the type of study design or intervention under analysis. Expectations on the effect of treatment on pain decrease seems dependent on many characteristics of patients and the setting they were recruited. It seems that patients are not satisfied with just a two-point difference. An option would be to adjust the definition of clinical important improvement. Nevertheless, in clinical practice, only a low rate of the pain population obtains a successful intervention. By increasing the clinical importance, to for example a three-point raw change, this will have a negative impact on our view of their efficacy and it may have implications for future study design, such as necessary sample sizes. Another option could be to analyze the importance and sensitivity of composite scores or quality of life scores as pain outcome measures.
In general, the Spearman correlation coefficient suggested that pain change is an important component measured by the PGIC, yet the lack of strong correlations shows that pain relief cannot explain treatment success in full. Moreover, relative changes were omitted in the results as they were interchangeable with the raw changes, making the interpretation easier when only the latter were included. This is due to the high baseline pain scores of the patients we have at the tertiary pain clinic, making the relationship between the raw change and PGIC as stable as the percent changes and PGIC. The questionnaires applied in both methods were validated for the chronic pain population and recommended by the IMMPACT guidelines, reducing the probability of errors in data collection. In both methods, application of stratification controlled for confounding. 19 Overall, consistency in association and performance was found between the change in pain intensity and meaningfulness of the PGIC, regardless of the treatment patients received, sex, treatment status, or baseline pain intensity, suggesting a high external validity toward the diverse F I G U R E 3 Average NRS change score for the PGIC category in men and women of the RCT's F I G U R E 4 Average NRS change score for the PGIC categories in treatment status of the cohort F I G U R E 5 Average NRS change score for the PGIC categories stratified on NRS baseline in the cohort chronic pain population. Consequently, the application of the results may provide indications on clinically important improvement, contribute in calculating sample size and number needed to treat in future studies, not only in randomized controlled trials, but also in cohort data for chronic pain patients referred to tertiary pain clinics.
In conclusion, the change in NRS scores associated with clinically important improvement was larger than is stipulated in the literature and the amount of pain relief needed was substantially larger in RCT's than in cohort data. Stratification on study design and sex showed the presence of heterogeneity in the pain relief and its significance in relation to treatment success, calling for caution in the interpretations as is it may be dependent on study design, NRS at baseline, or treatment status.