Measuring irritable bowel syndrome patient-reported outcomes with an abdominal pain numeric rating scale

Authors


  • Disclaimer: The opinions and assertions contained herein are the sole views of the authors and are not to be construed as official or as reflecting the views of the Department of Veteran Affairs.

Dr B. Spiegel, 11301 Wilshire Blvd, Building 115, Room 215, Los Angeles, CA 90073, USA.
E-mail: bspiegel@mednet.ucla.edu

Summary

Background  Controversy exists on how to measure patient-reported outcomes in irritable bowel syndrome (IBS) clinical trials effectively. Pain numeric rating scales (NRS) are widely used in the non-IBS pain literature. The Food and Drug Administration has proposed using the NRS in IBS.

Aim  To test the psychometrics of an abdominal pain NRS in IBS.

Methods  We analysed data from a longitudinal cohort of Rome III IBS subjects. At entry, subjects completed a 10-point NRS, bowel symptoms, IBS severity measurements (IBS-SSS, FBDSI), health-related quality of life indices (IBS-QOL, EQ5D), and the Worker Productivity Activity Index (WPAI). We repeated assessments at 3 months along with a response scale to calculate the minimal clinically important difference.

Results  There were 277 subjects (82% women; age = 42 ± 15) at baseline and 90 at 3 months. The NRS correlated cross-sectionally with IBS-SSS (r = 0.60; P < 0.0011), FBDSI (r = 0.49; P < 0.0001), IBS-QOL (r = 0.43; P < 0.0001), EQ5D (r = 0.48; P < 0.0001), presenteeism (r = 0.39; P < 0.0001), absenteeism (r = 0.17; P = 0.04) and distension (r = 0.46; P < 0.0001), but not stool frequency or form. The minimal clinically important difference was 2.2 points, correlating with a 29.5% reduction over time.

Conclusions  An abdominal pain NRS exhibits excellent validity and can be readily interpreted with a minimal clinically important difference in patients with IBS. These data support the use of the NRS in IBS clinical trials.

Introduction

Although irritable bowel syndrome (IBS) is a multisymptom disorder, abdominal pain is its defining characteristic1 and is a predominant feature of the IBS illness experience.2–6 Unlike other IBS symptoms, such as bloating or abnormalities in stool frequency or form, abdominal pain independently drives health-related quality of life (HRQOL) decrements in IBS5 and is the principal driver of patient-reported symptom severity.4, 6 In a recent study of 755 IBS patients in a university setting, we found that abdominal pain, as measured using a 21-point numeric rating scale (NRS), was the most powerful predictor of patient-perceived severity among 25 tested clinical factors.4 In short, IBS is defined by pain and pain is the cornerstone of the IBS illness experience.

As measuring and monitoring pain is vital in the clinical management of IBS, abdominal pain improvement has traditionally served as a primary or secondary endpoint in IBS clinical trials. Pain has been measured in various ways, including a five-point scale (none to very severe), a NRS, a visual analogue scale and as a binary endpoint (‘In the past 7 days, have you had adequate relief of your IBS pain and discomfort?’ [Yes/No]).7 However, there is no consensus regarding the optimal method for measuring abdominal pain in IBS patients.

There have been active and ongoing efforts to develop a valid and meaningful patient-reported outcome (PRO) in IBS. As investigators continue to develop endpoints for clinical trials, the Food and Drug Administration (FDA) has recently recommended interim co-primary endpoints for IBS – one for pain and the other for altered bowel habits (FDA Communication – 2009 Rome Foundation Endpoints and Outcomes Conference). For both IBS with diarrhoea (IBS-D) and IBS with constipation (IBS-C) trials, a 11-point NRS has been proposed as the co-primary endpoint for pain. Although the 11-point pain NRS has been evaluated in 10 placebo-controlled treatment trials in almost 2800 chronic pain patients and established as valid, responsive, and easy to employ,8 its psychometric properties have not been systematically assessed in IBS patients.

We sought to validate comprehensively an abdominal pain NRS in IBS using a previously described longitudinal multicentre IBS registry.9 We specifically aimed to: (i) measure the baseline and longitudinal correlations between the single-item NRS for pain and concurrent ratings of severity, bowel symptoms, HRQOL and resource utilization; (ii) compare NRS change scores in responders vs. nonresponders across a range of concurrently measured indices; and (iii) calculate a minimal clinically important difference (MCID) benchmark to help define ‘responder’ status on the abdominal pain NRS.

Methods

Study patients

We evaluated consecutive patients aged 18 years or older with Rome III positive IBS (including IBS-C, IBS-D and IBS-M) enrolled in the IBS Patient Reported Observed Outcomes and Function (PROOF) cohort. An overview of the PROOF methodology can be found in a previous publication.9 PROOF is an internet-based, longitudinal, observational registry of IBS patients identified within a network of 8 geographically diverse US centres. These included five university-based academic centres, one community-based primary care clinic, one Health Maintenance Organization general gastroenterology clinic and one community-based private general gastroenterology clinic. PROOF does not mandate any specified treatments or protocols; patients receive the ‘usual care’ of their healthcare providers. In this regard, PROOF is a natural history cohort outside the context of a traditional clinical trial. Each of the PROOF investigators is an experienced gastroenterologist with knowledge regarding the appropriate application of the Rome III criteria.

The cohort is administered centrally through the University of California at Los Angeles/Veteran Administration (UCLA/VA) Center for Outcomes Research and Education (CORE). Patients access the online survey through the UCLA/VA CORE website. Prior to enrolment, patients complete a set of introductory screens that present the Rome III diagnostic items for IBS. Patients who do not meet Rome III criteria at the time of survey are not allowed to enter the full survey and are subsequently removed from the cohort. Thus, there are two lines of security to maximize the likelihood that all patients have Rome III positive IBS: an initial screen by an experienced gastroenterologist and a secondary application of the Rome III criteria at the time of the survey.

Following a baseline survey, all participants receive a follow-up online survey at 3 months. Those failing to complete the online survey receive a paper survey by mail. The baseline PROOF questionnaire collects a wide range of biopsychosocial variables, including disease-specific and generic health HRQOL measures, psychological distress measures, resource utilization measures, worker productivity data (including absenteeism and presenteeism), severity indices, intestinal and ‘extra-intestinal’ co-morbidities, concurrent treatments and symptom profiles (Table 2). The study was approved by the University of California at Los Angeles Institutional Review Board and was conducted in accordance with the institutional guidelines regulating human subject research.

Table 2.   Key baseline characteristics of IBS PROOF patient cohort
VariableMean (n = 277)
Age (mean years ± SD)42 ± 15
Gender (% female)82
Race (%)
 White82
 Black 6
 Other12
Education (%)
 Graduated high school78
 Graduated college65
 Postgraduate education27
Income (%)
 <$50 000 annual49
 $50 000–$100 000 annual29
 >$100 000 annual22
Marital Status (% Married)48
IBS Subtype (%)
 IBS with Constipation (IBS-C)18
 IBS with Diarrhoea (IBS-D)29
 Mixed IBS (IBS-M)53
IBS Duration (%)
 6 months to 1 year 4.0
 1–2 years 7.4
 2–5 years18.1
 5–10 years23.5
 10–20 years30.9
 More than 20 years16.1
IBS pain severity (10-point numeric rating scale)
 In all patients at baseline 4.5 ± 2.5
 In patients with ≥3 out of 10 points at baseline 5.6 ± 2.0
 Global IBS severity (0–20 rating scale)11 ± 5
IBS-SSS trichotomized severity
 % Mild (score of 75–175)17
 % Moderate (score of 175–300)46
 % Severe (>300)37
 IBS-QOL overall score (mean ± SD)62.7 ± 22
Worker Productivity Activity Index (WPAI:IBS), %
 Work week absent from IBS (absenteeism) 3.6
 Work week impaired from IBS   (presenteeism)34.4

Abdominal pain NRS

The outcome measure of this study was a 10-point abdominal pain NRS with the following directions: ‘How much abdominal pain have you felt today, on a scale from 1 (none) to 10 (very severe)’. This is a modification of the 11-point NRS supported by the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) for the non-IBS pain literature,10 and as further evaluated in a meta-analysis by Farrar and colleagues.8 The FDA has recently proposed use of a similar 11-point NRS for IBS drug trials using a scale from 0 (none) to 10 (worst possible pain) to describe the worst abdominal pain episode from the past 24 h. The FDA suggests collecting daily abdominal pain NRS data for seven consecutive days and then calculating a 7-day mean NRS to establish the end-of-study pain rating. As the PROOF registry is not contacted daily, our cohort does not include daily NRS data. Thus, our study provides data on the 1-day NRS rating. Nonetheless, this is an important step in validating the proposed FDA approach, as the 7-day composite index proposed by the FDA relies entirely on the psychometric properties of the 1-day NRS. The daily abdominal pain NRS serves as the building block for the 7-day composite NRS score – if it is invalid, then the 7-day running average will likely be invalid as well.

Pain severity inclusion criteria

In a clinical trial, patients must report some pain at baseline to justify measuring pain improvement over time – i.e. patients without baseline pain will, by definition, have no opportunity to improve. For that reason, we limited our analysis only to IBS patients reporting any abdominal pain at baseline, which is 90% of the PROOF cohort.9 Moreover, patients with only minimal baseline pain are limited by a ‘floor effect’ defined by the small scale distance between their current pain and no pain at all. The FDA has proposed that IBS clinical trials require an NRS score of at least 3 out of 10 to justify inclusion into a trial measuring pain improvement. The rationale is that patients with NRS scores of 0 have no pain to begin with, and those with scores of 1 or 2 can only improve minimally, at best. In light of these considerations and consistent with inclusion criteria widely applied for pain trials in other areas of medicine,8 we limited our longitudinal analyses only to patients with a baseline abdominal pain score of 3 or greater (73% of PROOF cohort).

Analyses

Baseline construct validation of the NRS in IBS.  One method of establishing the validity of a PRO is to measure its cross-sectional relationship with other established biopsychosocial domains. Thus, for the NRS to demonstrate baseline construct validity, we hypothesized a priori that its scores must significantly correlate with a wide range of other pre-determined IBS constructs – a list of constructs culled from the literature as key components of the IBS illness experience (Table 1). In other words, if the NRS were unable to correlate cross-sectionally with these key constructs, it would be considered unrelated to the IBS illness experience, thus undermining the relevance in IBS patients. Specifically, we measured ‘IBS severity’ with the IBS Severity Scale (IBS-SSS),11 Functional Bowel Disease Severity Index (FBDSI),12 and BEST score,13 disease-targeted HRQOL with the IBS-QOL instrument,14 generic HRQOL with the EQ5D,9, 15 worker productivity with the IBS version of the Work Productivity Activity Index (WPAI:IBS),16 gastrointestinal-specific anxiety with the Visceral Sensitivity Index (VSI),17, 18 generic psychological function with the Hospital Anxiety & Depression (HAD) scale and the Brief Symptom Inventory Somatization scale, and symptom coping using a five-point Likert scale. In addition, we concurrently measured IBS bowel symptoms, including abdominal distension, stool frequency, stool form, urgency, disease duration, flare duration and IBS subtype. Finally, we measured a range of extraintestinal IBS symptoms,5, 19 including sleep disturbances, sexual dysfunction and fatigue, along with measures of resource utilization, including self-reported diagnostic testing history, physician visits and current number of IBS therapies (from a checklist of 16 potential IBS treatments). We expressed the cross-sectional relationship between each index and the abdominal pain NRS using Pearson’s correlation coefficient and adopted a P-value of <0.05 as evidence for statistical significance.

Table 1.   Patient-reported items and instruments in the PROOF Registry
Construct Items/instrumentAssessed at baseline onlyAssessed longitudinallyInstrument description
  1. HRQOL, health-related quality of life.

DemographicsAge, gender, race, marital status, income, education, health care coverageYesNoN/A
IBS severity21-Point Numeric Rating ScaleYesYesA 21-point scale anchored by ‘no symptoms’ (0) to ‘most intense symptoms imaginable’ (20)4, 35
Irritable Bowel Syndrome Severity Scale (IBS-SSS)YesYesPhysician-derived severity scale for IBS11
Functional Bowel Disease Severity Index (FBDSI)YesYesPhysician-derived severity scale for IBS12
BEST ScoreYesYesA 4-item measure of IBS severity13
Generic HRQOLEuroQoLYesYesA multiattribute generic health utility index36
Center for Disease Control – 4 (CDC-4)YesYesA four-item generic HRQOL instrument developed by the CDC and previously validated in IBS15
Disease-targeted HRQOLIBS Quality of Life Instrument (IBS-QOL) Overall ScoreYesYesA 34-item valid and reliable disease-targeted HRQOL instrument in IBS14
Generic psychological healthHospital Anxiety & Depression (HAD) QuestionnaireYesNoA 14-item measure of generalized anxiety and depression
Somatization Scale of the Brief Symptom Index (BSI)YesNoAn 8-item measure of somatization37
Disease-targeted anxietyVisceral Sensitivity Index (VSI)YesYesA 15-item measure of diseased-targeted visceral anxiety17
Work ProductivityWorker Productivity Activity Index for IBS (WPAI:IBS)YesYesA disease-targeted measure of absenteeism and presenteeism16
Nonpain IBS symptomsBloating, distension, stool frequency, stool form, urgency, IBS subtype, disease duration, flare durationYesYesMeasured using a combination of Likert scales and numeric rating scales
Extraintestinal symptomsSleep disturbances, sexual dysfunction, fatigueYesNo 
Single-item endpointsConsiderable reliefYesYesA binary measure of global improvement7

Calculating the MCID.  The FDA PRO Guidance emphasizes the difference between statistical significance and clinical relevance.20 Patients and their providers are principally concerned with the latter. Therefore, it is important to develop a clinically based rule for how to interpret change scores on a PRO instrument, such as the abdominal pain NRS. The MCID is traditionally defined as the smallest difference in a score that patients perceive as beneficial.21 Documenting an MCID benchmark is vital to allow a PRO to be successfully used in prospective trials, because it establishes an a priori responder definition.20 When using a linear outcome, such as HRQOL, ‘responders’ can be defined as those achieving an MCID on the response scale.21 For example, the MCID on the IBS-QOL is 10 points.22 Thus, any patient meeting or exceeding a 10-point change on the IBS-QOL is considered a ‘responder’. There are various ways to measure the MCID of a PRO, but the optimal approach is to map change scores to patient report of improvement using a balanced 13-point response scale, as described by Guyatt.21 In this technique, the response scale is administered at the follow-up period and asks patients to consider their overall health compared with the last time it was evaluated. The scale includes six levels of improvement, six levels of decrement and one level for ‘almost the same’, which balances the scale in the middle. Patients scoring +1 (‘a little bit better’) or +2 (‘somewhat better’) are considered to have minimally improved. The MCID is then defined as the mean change score in this subgroup of minimal responders. We employed this technique, using a Guyatt scale at 3-months’ follow-up, to calculate the MCID of the abdominal pain NRS in our IBS population. We report the MCID using two values: (i) the mean absolute change score over time in minimal responders and (ii) the mean percentage change score over time in minimal responders.

We also calculated MCIDs for all concurrent indices listed in Table 1, and assigned end-of-study MCID status for each index in each patient. To do this, we first calculated a ‘baseline to study endpoint change score’ for each index in each patient, using subjects with scores at both time points. We evaluated whether the size of the change score exceeded the Guyatt scale-based MCID and then stratified patients as minimal ‘responders’ vs.‘nonresponders’ for each index. For example, the data in our sample indicate that the MCID on the IBS-SSS is 95 points. Thus, any patient exceeding a change of 95 points on the IBS-SSS over time was classified as a ‘responder’ using this anchor-based benchmark.

Longitudinal construct validation.  We performed a series of prospective construct validity analyses to measure the performance of the pain NRS longitudinally. This set of analyses sought to establish whether the NRS could longitudinally track in the same direction as the concurrently measured biopsychosocial constructs listed in Table 1. We assessed construct validity by calculating a series of Pearson correlation coefficients to compare change scores in each concurrent index vs. change scores in the NRS. We calculated the P-value for each longitudinal ‘difference vs. difference’ correlation, adopting <0.05 as evidence for statistical significance.

However, because statistical significance does not address clinical relevancy, we also adopted a measure of clinical relevance by applying the previously calculated anchor-based MCID definitions for each concurrent index. For this set of analyses, we calculated the proportion of patients achieving an MCID over time for each construct stratified by pain NRS response status and conducted a chi-squared test on each resulting 2 × 2 table [i.e. NRS MCID response (yes, no) vs. concurrent index MCID response (yes, no)].

Impact of baseline severity on performance of pain NRS.  It has been argued that the performance of a PRO should not be conditionally tied to baseline severity.23 To measure the relationship between baseline severity and NRS response status, we first divided patients at baseline into three severity levels, as defined by tertiles from baseline IBS-SSS scores – a standard technique in the IBS literature.23–25 We then measured the performance of the NRS stratified across severity tertiles using two analyses: (i) anova to compare mean NRS change scores across tertiles and (ii) chi-squared to compare the proportion achieving an MCID on the NRS across tertiles. We used sas v9.1 (SAS Institute Inc, Cary, NC, USA) for all analyses.

Results

Patient characteristics

There were 277 total subjects at baseline and 123 with 3-month follow-up data. Of the group with 3-month data, 90 had a score of ≥3 on the baseline abdominal pain NRS and were therefore eligible for our longitudinal analyses. Table 2 provides an overview of key baseline characteristics of the sample. The patient profiles are consistent with previous studies in IBS. Namely, the patients were primarily young (mean age = 43 ± 15 years) and female (82%). The population was diverse across demographic characteristics, including race, education, and income. Eighteen percent of the cohort had IBS-C, 29% IBS-D and 53% IBS-M using Rome III subclassification criteria.1 Using IBS-SSS criteria for severity, 17%, 46% and 37% of patients had mild, moderate and severe IBS symptoms.

Pain NRS baseline construct validity in IBS cohort

At baseline, the mean abdominal pain NRS was 4.5 ± 2.5 in all comers, and 5.6 ± 2.0 when limited only to patients with ≥3 out of 10 on the NRS. Table 3 demonstrates the baseline correlations of the pain NRS with the a priori IBS constructs. At baseline, the NRS significantly correlated with all measures of IBS severity (FBDSI, IBS-SSS, BEST, 21-point severity NRS), disease-targeted (IBS-QOL) and generic (CDC-4, EQ5D) HRQOL, work presenteeism and absenteeism, HAD depression symptoms, somatization, sexual function, fatigue and symptom coping. The NRS correlated with visceral anxiety (VSI), but not generic anxiety (HAD). Among bowel symptoms, the NRS was highly correlated with abdominal distension and flare duration, but not stool frequency or form, incomplete evacuation or urgency. There was a highly significant difference in NRS scores between patients with vs. without ‘considerable relief’ of their IBS symptoms. Finally, the abdominal pain NRS was highly correlated with the total number of contemporaneous IBS therapies and had a trend (P = 0.06) towards correlating with the number of physician visits for IBS. In short, the abdominal pain NRS significantly correlated with a wide range of disparate indices that jointly capture the illness experience of IBS and revealed evidence of statistically significant baseline construct validity.

Table 3.   Baseline and longitudinal correlations of 10-point abdominal pain Numeric Rating Scale (NRS) with key IBS constructs
ComparisonBaseline correlation coefficient (P-value)Longitudinal correlation (P-value)
  1. Refer to Table 1 for a description of all constructs and their abbreviations.

Pain NRS vs. IBS-SSS0.60 (<0.0001)0.52 (<0.0001)
Pain NRS vs. FBDSI0.49 (<0.0001)0.67 (<0.0001)
Pain NRS vs. BEST0.31 (<0.0001)0.41 (0.0002)
Pain NRS vs. 21-point severity NRS0.63 (<0.0001)0.56 (<0.0001)
Pain NRS vs. IBS-QOL0.43 (<0.0001)0.39 (0.003)
Pain NRS vs. CDC-40.32 (<0.0001)0.15 (N.S.)
Pain NRS vs. EQ5D0.48 (<0.0001)0.39 (0.02)
Pain NRS vs. WPAI absenteeism0.17 (0.04)0.27 (0.003)
Pain NRS vs. WPAI presenteeism0.39 (<0.0001)0.39 (0.04)
Pain NRS vs. HAD depression0.37 (<0.0001)0.40 (<0.01)
Pain NRS vs. HAD anxiety0.10 (N.S.)0.21 (<0.01)
Pain NRS vs. VSI0.35 (<0.0001)0.35 (0.003)
Pain NRS vs. Somatization0.2 (0.003)
Pain NRS vs. Symptom coping0.39 (<0.0001)
Pain NRS vs. Sexual dysfunction0.13 (0.04)
Pain NRS vs. Sleep disturbances0.08 (N.S.)
Pain NRS vs. Fatigue0.16 (0.01)
Pain NRS vs. Abdominal distension0.46 (<0.0001)0.3 (0.04)
Pain NRS vs. Abdominal bloating0.27 (0.06)
Pain NRS vs. Stool frequency0.19 (N.S.)
Pain NRS vs. Hard stools0.09 (N.S.)
Pain NRS vs. Loose stools0.03 (N.S.)
Pain NRS vs. Incomplete evacuation0.22 (N.S.)
Pain NRS vs. Bowel urgency0.01 (N.S.)
Pain NRS vs. Symptom flare duration0.34 (<0.0001)
Pain NRS vs.‘Considerable relief’−0.39 (<0.0001)−0.46 (0.0005)
Pain NRS vs. Physician visits0.12 (0.06)
Pain NRS vs. Number IBS therapies0.17 (0.006)

Calculating MCIDs

There were 19 patients who improved minimally during the 3-month period, as defined by a +1 or +2 improvement on the follow-up Guyatt response scale. The mean abdominal pain NRS change score in this subgroup was 2.2 points, correlating with a 29.5% reduction in NRS over the 3-month period.

Prospective construct validity of NRS

The prospective construct validity of the NRS was tested by calculating a series of Pearson correlation coefficients to compare ‘change vs. change’ for each longitudinal index compared with the NRS. The results of these longitudinal analyses are presented in Table 2. The NRS was able to track statistically with nearly all the concurrently measured indices.

To measure the clinical relevance of these results, we also calculated the proportion of patients in each group achieving an MCID, using the Guyatt-based metric for each index. Figure 1 portrays the results. The NRS was able to discriminate significantly responders from nonresponders for distension, IBS-SSS, FBDSI, BEST, VSI and EQ5D.

Figure 1.

 Proportion of patients achieving a minimal clinically important difference (MCID) for key variables stratified by pain numeric rating scale (NRS) responder status. Each set of bars depicts the results of an individual longitudinal 2 × 2 table, and demonstrates the difference in MCID achievement between NRS responders vs. nonresponders over the 3-month study period. For example, 46% of NRS responders achieved an MCID in distension, whereas only 15% of NRS nonresponders achieved an MCID in distension (31% difference in response; P = 0.002). The differences were statistically significant across all comparisons except IBS-QOL, where there was a trend towards significance. The data indicate that the pain NRS was able to discriminate between clinically relevant responses for each of the concurrent indices tested.

Effect of baseline severity on NRS

The mean absolute change score in patients with baseline mild, moderate and severe IBS (using IBS-SSS tertiles) was 0.8, 1.8 and 1.9 respectively (P = 0.18). The proportion of patients achieving an MCID on the NRS across severity groups was 36%, 42% and 23% respectively (P = 0.3). Therefore, in both sets of analyses, the relationship between baseline severity and NRS response was not statistically significant.

Discussion

Controversy exists on how to measure PROs in IBS effectively. This debate is important, because IBS remains a patient-reported condition that cannot yet be reliably diagnosed or monitored with biomarkers alone; patient reports are essential. In the absence of valid and reliable biomarkers to substratify patients accurately within an otherwise heterogeneous condition, clinicians and investigators are left to interpret patient-reported symptoms to determine the diagnosis, gauge overall disease severity, develop rational treatment plans and assess outcomes.

The challenge of interpreting patient reported outcomes in IBS is now front and centre for clinicians, investigators and regulatory agencies such as the FDA. The charge for all stakeholders is to identify PRO measures that are sufficiently reliable and valid, for both clinical trials and clinical practice. An optimal PRO measure must be easily administered, able to discriminate between important patient subgroups and disease states in a statistically significant and clinically relevant manner, predictable in behaviour when tracked with other indicators of illness severity, not dependent on baseline severity and readily interpretable.20

Numeric rating scales have become the standard for measuring pain in non-IBS chronic pain conditions, such as chronic migraine headache, diabetic neuropathy, osteoarthritis, chronic low back pain and fibromyalgia among others.8, 10, 26 Although IBS is a multisymptom disorder, almost every patient with IBS, including those in our PROOF cohort,9 reports at least some abdominal pain attributable to their IBS. Although the current Rome III criteria for IBS allow either abdominal pain or ‘discomfort’,1 earlier diagnostic criteria, such as the Kruis,27 Manning,28 and Rome I,29 all specified pain – not discomfort – as the hallmark symptom of IBS. Moreover, our data and those of others reveal that abdominal pain is the principal driver of overall illness severity in IBS, and drives HRQOL more than any other bowel symptom.4–6 In short, IBS is very much an abdominal pain syndrome, suggesting that it could potentially be measured in the same manner as other pain conditions – namely, with a pain NRS. Yet, before an NRS could be adopted for IBS clinical trials, it is vital to test first the psychometric properties of the endpoint to establish whether its time-tested validity and reliability in other pain conditions are reproducible in IBS. It is also important to define the level of change in the NRS that represents an MCID, as this is important for defining a ‘responder’ in both clinical trials and everyday clinical practice.

We tested the psychometrics of a 10-point abdominal pain NRS both cross-sectionally and longitudinally in a comprehensively defined natural history cohort of patients with IBS. Our analysis has six key findings: First, we found that the abdominal pain NRS behaved as expected when cross-sectionally mapped against concurrently measured indices. In particular, the single-item PRO strongly correlated with a wide range of multi-item severity instruments, including the IBS-SSS, FBDSI and BEST scores. In addition, it correlated with generic and disease-targeted HRQOL instruments, psychological measures of depression, anxiety and somatization, extraintestinal symptoms of fatigue and sexual dysfunction and bowel symptoms and their attributes, such as abdominal distension, symptom flare duration and overall symptom coping and perceived relief.

Second, we found that the abdominal pain NRS tracked longitudinally with a range of concurrently measured indices and did so in both a statistically significant and clinically relevant manner. Using pre-defined MCID benchmarks for each of the longitudinal indices, we found that the NRS successfully discriminated between minimal responders vs. nonresponders for abdominal distension, IBS-SSS, FBDSI, BEST, VSI and EQ5D. In short, changes in the NRS over time correlated with changes in concurrently measured indices (Table 2) and these relationships were clinically important (Figure 1). This supports the construct validity and clinical responsiveness of the pain NRS in IBS.

Third, we found that the 10-point NRS in our study had an MCID of 2.2 points, which correlated with a 29.5% reduction in NRS over time. These values are remarkably similar to the ‘clinically important differences’, or CIDs, that Farrar et al. calculated for the NRS in non-IBS chronic pain conditions. Specifically, in a meta-analysis of 10 clinical trials in various pain conditions using an 11-point NRS, the CID was a absolute decrease in 2.0 points and a relative decrease of 30%. Although our scale is not precisely the same as the standard 11-point NRS employed by Farrar et al., there is strong convergent validity between our data and the combined experience in 10 clinical trials across various pain conditions. This provides further evidence that the NRS may behave as expected if used in IBS, and provides an a priori definition for a ‘responder’ should the abdominal pain NRS be used in IBS clinical trials.

Fourth, we found that the abdominal pain NRS correlated with measures of resource utilization in IBS. Specifically, the pain NRS scores strongly predicted self-reported physician visits for IBS, total number of contemporaneous therapies prescribed for IBS and even work productivity, including absenteeism and presenteeism (i.e. both missing work and losing productivity while present for work). This is a notable finding, as resource utilization outcomes are very distal from symptoms themselves. We hypothesize that the symptoms of IBS, including pain, discomfort and alterations in stool frequency and form, lead to decrements in HRQOL. These HRQOL decrements, in turn, can subsequently impact resource utilization. In this study, we have confirmed that abdominal pain in IBS, as measured by an NRS, predicts both HRQOL and a range of resource utilization outcomes. This is important because many symptoms in IBS do not consistently predict either HRQOL or resource utilization, including abnormalities in stool frequency or form.5, 30 Our data further support pain as a fundamental driver of downstream HRQOL and resource utilization events in IBS and also suggest that pain is a ‘master key’ that, when present, unlocks a range of adverse consequences.

Fifth, we found that the pain NRS is not strongly associated with symptoms of abnormal stool frequency or form. Specifically, the NRS did not cross-sectionally correlate with frequent bowel movements, infrequent bowel movements, incomplete evacuation or urgency. This is not surprising, as factor analysis reveals that IBS patients suffer from both ‘painful’ and ‘uncomfortable’ symptoms.31 Abdominal pain, by definition, is ‘painful’. Most other symptoms in IBS are not consistently described as ‘painful’, but are instead classified as ‘uncomfortable’ (PROOF cohort unpublished data). These data jointly emphasize that IBS is defined by more than pain alone (and nonpainful symptoms can be highly distressing to patients) – yet pain appears to be the principal driver of outcomes. Although our study did not directly measure the incremental benefit of measuring nonpainful symptoms over and above abdominal pain, it is possible that measuring the pain NRS is sufficient to capture adequately the illness experience of IBS. However, this must be explicitly studied. In the meantime, the definition of IBS emphasizes that the syndrome is defined by both pain and discomfort.

Sixth, we found that the pain NRS was not conditional on baseline severity. Specifically, patients with baseline mild, moderate and severe symptoms (using IBS-SSS criteria) had mean NRS change scores 0.8, 1.8 and 1.9 respectively (P = 0.18). In theory, a reliable PRO measure should not be conditional on baseline severity. This criticism has been employed for traditional binary endpoints, such as ‘adequate relief’, ‘considerable relief’ or ‘satisfactory relief’ of bowel symptoms,23 although it remains unclear whether binary endpoints are themselves conditional on baseline severity.24, 25, 32, 33 In any event, our data indicate that the pain NRS is not conditional on baseline severity, further supporting the use of the NRS in clinical trials.

Our study has limitations. First, we employed a 10-point NRS – not the traditional 11-point scale widely used in the pain literature. In addition, our upper anchor (‘Very Severe Pain’) is different from the traditional pain upper anchor (‘Worst Imaginable Pain’). Although it is possible that an 11-point traditional scale would generate different results, there is little a priori reason to expect that it would behave dramatically differently from our homologous scale. Moreover, our MCID results reveal striking convergent validity with the data from the 11-point scale, indicating that our NRS behaved in a manner similar to the traditional pain NRS. Nevertheless, additional research should directly measure the standard 11-point NRS in patients with IBS. In the meantime, these data provide strong evidence that an abdominal pain NRS may work well in IBS.

Second, this is an observational cohort of patients, not a tightly controlled clinical trial. However, we believe that there are important benefits of monitoring IBS patients outside of a clinical trial. Moreover, an observational cohort is well-suited for the purpose of psychometric validation of PROs. It is by no means mandatory to validate PROs in the context of a clinical trial. In fact, it is arguably suboptimal to use clinical trials as a platform for psychometric validation. In addition, our results cannot be generalized to all IBS patients. Nonetheless, our cohort is reflective of other IBS populations, as the patients are primarily young (mean age = 43) and female (82%), are diverse across demographic characteristics and are well distributed across severity strata (17%, 46% and 37% had mild, moderate and severe IBS symptoms respectively).

A third limitation is that we only monitored patients over a 3-month period. As IBS is a chronic condition, it is possible that utility scores might vary with longer follow-up periods. However, the Rome guidelines recommend a minimum 4–12 week period for purposes of clinical investigation34 and our study complies with this standard. As the purpose of this study was to help validate an endpoint being considered for clinical trials, the 3-month time horizon is well suited and appropriate. Nonetheless, we plan to continue following this cohort over an extended 12-month period to evaluate for subsequent changes.

Our study is further limited by the relatively small number of subjects at follow-up versus baseline. This occurred for several reasons including the following: (i) some patients opted to only participate in the baseline survey and requested no further communications thereafter; (ii) patients only received payment for initial enrolment – not for follow-up assessments; (iii) as our recruitment is a rolling process, some patients had not completed the 3-month follow-up at the time of analysis; and (iv) many patients did not respond to our follow-up email or paper mail requests. To check whether the nonresponders were systematically different from the responders, we compared baseline demographics (age, gender) and severity score (IBS-SSS, FBDSI, severity NRSs) between groups. There were no significant differences between these groups at baseline. A related limitation is that our MCID calculations were based on a small subsample of the 123 longitudinal subjects. In particular, there were only 19 subjects with a +1 or +2 improvement on the follow-up Guyatt response scale. It is possible that a larger sample size would yield a different MCID. Nonetheless, it is notable that our MCID is highly convergent with MCIDs achieved in other non-IBS pain conditions.

In conclusion, an abdominal pain NRS exhibits excellent construct and discriminant validity in IBS. Scores on the NRS can be readily interpreted with an absolute and relative MCID of 2.2 points and 29.5% reduction respectively. These values could potentially serve as responder definitions for clinical trials. These data provide strong evidence that measuring abdominal pain with an NRS is sensible for IBS clinical trials. Future research should confirm these results in other populations with other NRSs.

Acknowledgements

Declaration of personal interests: Brennan Spiegel is the guarantor of this manuscript. He has served as a consultant for Prometheus, Takeda, AstraZeneca, and Novartis. Drs Spiegel and Chang formulated the hypotheses and aims of the study, wrote the study protocol and prepared the manuscript. Dr. Spiegel performed the analyses in concert with Drs Chang and Bolus. Drs. Chang, Chey, Derezin, Dulai, Esrailian, Harris, Karsan, Lembo, Lucak, Strickland, and Tillisch assisted with patient recruitment and review of the manuscript. Drs Naliboff and Mayer provided intellectual input to the manuscript. Declaration of funding interests: Dr Spiegel has received grant support from Takeda, AstraZeneca, and Rose Pharmaceuticals. Partial support was provided from the UCLA Center for Neurobiology of Stress (NIH P50 DK64539 and 1 R24 AT002681-NCCAM). Dr Spiegel is supported by a Veteran’s Affairs Health Services Research and Development (HSR&D) Career Development Award (RCD 03-179-2), and the CURE Digestive Disease Research Center (NIH 2P30 DK 041301-17). Dr Chang, Naliboff and Mayer are supported by NIH Grant No. P50 DK64539, and Dr Spiegel, Mayer and Naliboff are supported by NIH Center Grant 1 R24 AT002681-NCCAM. Study supported by an investigator-initiated research grant from Takeda Pharmaceuticals.

Ancillary