Synthesis and summary of patient‐reported outcome measures to inform the development of a core outcome set in colorectal cancer surgery

Abstract Aim Patient‐reported outcome (PRO) measures (PROMs) are standard measures in the assessment of colorectal cancer (CRC) treatment, but the range and complexity of available PROMs may be hindering the synthesis of evidence. This systematic review aimed to: (i) summarize PROMs in studies of CRC surgery and (ii) categorize PRO content to inform the future development of an agreed minimum ‘core’ outcome set to be measured in all trials. Method All PROMs were identified from a systematic review of prospective CRC surgical studies. The type and frequency of PROMs in each study were summarized, and the number of items documented. All items were extracted and independently categorized by content by two researchers into ‘health domains’, and discrepancies were discussed with a patient and expert. Domain popularity and the distribution of items were summarized. Results Fifty‐eight different PROMs were identified from the 104 included studies. There were 23 generic, four cancer‐specific, 11 disease‐specific and 16 symptom‐specific questionnaires, and three ad hoc measures. The most frequently used PROM was the EORTC QLQ‐C30 (50 studies), and most PROMs (n = 40, 69%) were used in only one study. Detailed examination of the 50 available measures identified 917 items, which were categorized into 51 domains. The domains comprising the most items were ‘anxiety’ (n = 85, 9.2%), ‘fatigue’ (n = 67, 7.3%) and ‘physical function’ (n = 63, 6.9%). No domains were included in all PROMs. Conclusion There is major heterogeneity of PRO measurement and a wide variation in content assessed in the PROMs available for CRC. A core outcome set will improve PRO outcome measurement and reporting in CRC trials.


Introduction
The measurement of patient-reported outcomes (PROs) has become standard in the assessment of colorectal cancer (CRC) treatments, and their use is recommended by funding and regulatory agencies [1]. Many patientreported outcome measures (PROMs) have therefore been developed for a variety of purposes [2]. Some are generic, and allow comparisons between patients with other conditions (e.g. SF-36, EQ-5D), others are designed for patients with cancer (e.g. EORTC QLQ-C30, FACT-G), and some are specific for CRC (e.g. EORTC-CR29, FACT-C). To add further complexity, each of these PROMs typically consists of a number of questions (items), which are often grouped together to represent similar concepts (scales). For example, two questions regarding activities of daily living and leisure activities in the EORTC QLQ-C30 measure are grouped into a single 'role function' scale. There are therefore a multitude of ways to measure PROs to evaluate treatment for CRC, and this creates problems that may influence the conduct and clinical impact of trials.
Trials may use different PROMs [3,4] making it impossible to synthesize data across trials or undertake meta-analyses. The multiplicity of results available from trials means that it is difficult to interpret findings in the context of clinical practice because of a lack of familiarity with the number of measures, scales and items [2]. For example, the scale 'physical function' exists in several different PROMs, but individual items in these scales vary considerably between questionnaires. This is confusing for clinicians, who may not be aware of the differences between PROMs, and it is likely to limit the meaningful use of the data in practice. Finally, the opportunities for measuring multiple outcomes may lead to selective reporting of significant findings. This can generate bias and influence clinical interpretation of trials [5].
A proposed solution to these issues are 'core outcome sets'. Core outcomes are the minimum set of outcomes that patients and professionals agree should be measured in all trials of a certain condition [6]. They aim to facilitate comparisons between trials and aid meta-analysis by standardizing outcome measurement, including PROs. The use of core sets may also facilitate the clinical communication of data. Many core outcome sets have now been developed in different clinical areas, including rheumatology [7], paediatrics [8] and obstetrics [9], but not in CRC surgery. This systematic review aims to examine the measurement of PROs in CRC surgical studies, and use the data to inform the development of the core outcome set.

Method
A systematic review of prospective CRC surgical studies measuring PROs was undertaken to: (i) summarize PRO measurement in CRC surgical studies, and (ii) examine each PROM in detail and categorize analogous concepts into domains to inform the future development of a core outcome set.

Systematic search and data extraction
This systematic review adhered to a predefined protocol (available on request from the authors). Validated terms  [10], methodological challenges in measuring PROs in CRC [11], laparoscopic surgery [12], long-term survivors [13], rectal cancer [3] and CRC before 2009 [14]. The studies identified in these reviews were included. All citations were collated with REFERENCE MANAGER 12 (Thomson Reuters, New York city, New York, USA) and the duplicates removed.
Titles and abstracts of identified publications were screened by one researcher. If there was uncertainty about the eligibility of a publication the full paper was also accessed. Articles were included if they were original research papers reporting PROs of CRC surgery (curative or palliative), with or without neoadjuvant or adjuvant therapies, or systematic reviews of such publications. PROs were defined as end-points provided by patients themselves and not interpreted by observers.
Studies of nonbiomedical interventions (e.g. alternative medicine), palliative treatments that did not include a surgical component (e.g. palliative chemotherapy), screening studies, treatment of colorectal metastases and molecular and genetic prognostic studies were all excluded. Studies of more than one cancer site or of mixed benign and malignant disease were included provided the data for CRC patients was presented separately from that of the other diseases.
Data extraction included: participant demographics (number, age and gender); treatment received (surgery, neoadjuvant radiotherapy/chemoradiation and adjuvant chemotherapy); treatment intent (curative or palliative); the study design (randomized trial, case-control study, cohort study, cross-sectional study, prospective case series or other design); the PRO questionnaire used; and the individual items included in each questionnaire. When the individual questionnaires were not available in publications, internet searches and direct contact with authors were used to obtain the information. All data were entered into a Microsoft Access (Microsoft, Redmond, Washington, USA) database to facilitate data management and analyses. The data extraction was checked by a second reviewer (ROF) for a sample of included articles (n = 25) and any disagreements were discussed and resolved with the senior author (JMB).

Summary of PRO measurement in CRC surgery
The number of publications reporting each PROM was tabulated and descriptive statistics used to summarize PRO measurement. The popularity of PROMs was assessed by comparing their frequency of use in studies. A summary of each PROM is provided in terms of the numbers of items, scales and whether a total score was used. The distribution of items among PROMs was examined by calculating the median number and range of items per PROM. Questionnaires were categorized as: (i) generic (for use in all patients), (ii) cancer specific (for use in all cancer patients), (iii) CRC specific (for use in CRC patients), (iv) symptom specific (to assess a single symptom, e.g. pain), or (v) ad hoc.

Examination of PROs and domain categorization
Individual items from all questionnaires were extracted and formed into a long-list before categorization into health domains by two researchers (RNW and JR). Both were kept masked as to which PRO questionnaire the items were derived from. Two patient representatives (JEJ and GS) and one consultant colorectal surgeon (AMP) subsequently checked this process. Discrepancies were discussed and resolved with the senior author (JMB).  Categorization was summarized using descriptive statistics to explore the distribution of items and PROMs between domains. The number of items included in each domain was counted, as were the number of PROMs from which they were sourced. The contribution of each source PROM was demonstrated by calculating the median number and range of items included from the measures.

Results
A total of 5644 titles and abstracts were identified, of which 2127 were duplicates. The remaining 3517 were screened and 29 original research articles included. In addition to this, six systematic reviews of PROs in CRC surgery identified a further 72 original research articles (Fig. 1). In total, 102 original publications including 25 randomized controlled trials (25%) and 77 nonrandomized studies (75%) reporting the outcome for 66 386 patients with CRC  were included. The studies are summarized in Table 2.

Summary of PROM in CRC surgery
Fifty-eight different PRO questionnaires were identified and these were reported 184 times in the included publications (Table 3). There were 23 (39.7%) generic questionnaires, four (6.9%) cancer-specific questionnaires, 11 (19.0%) CRC-specific questionnaires and 17 (29.3%) symptom-specific questionnaires. Three ad hoc questionnaires (those devised specifically for the study) were not categorized.
There was little evidence of consistency between PROMs. No domains were measured in all the PROMs.   The two domains that were best represented were 'anxiety' and 'social function', each measured by 22 (44%) PROMs. Otherwise, most domains (n = 39, 76%) were measured by less than a quarter of PROMs, highlighting further heterogeneity. There were two domains with a high median number of items included per PROM: 'stoma problems', which contained 52 items from only five PROMs (median seven items per PROM) and 'satisfaction with care', which featured six items from just one PROM. This may reflect specialization of PROMs, with some measures focusing on very specific concepts.

Discussion
This systematic review aimed to summarize PRO measurement in current CRC surgical studies and categorize PRO items into analogous concepts to inform the development of a core outcome set. There was evidence of significant heterogeneity of PRO measurement in the included studies. Fifty-eight different PROMs were used to assess patient experience of colorectal surgery. Most (n = 40, 69.0%) were only ever used once, and even the most common (EORTC QLQ-C30) was measured in less than half of the studies. PROMs also varied greatly in terms of their content, with some as simple as a single item while others included up to 65. Most (52%) PROMs were not designed to be specific to CRC surgery or symptoms thereof, and although this may bring benefits in terms of comparison between diseases they may not be sensitive enough to issues that are of specific importance to patients with CRC. Over 900 individual questionnaire items were evident from 50 PROMs, and through a rigorous process, these were categorized into 51 'health domains'. This demonstrated a further lack of consistency, with no domains being measured in all the PROMs, and most health domains only being measured by less than a quarter of PROMs. All of this highlights potentially major questions for evidence synthesis and clinical interpretation of results in studies of CRC surgery, and demonstrates the need for a standardized core outcome set.
Other studies have highlighted the problem with outcome heterogeneity for clinical and PRO data. A recently published systematic review identified 194 studies of CRC surgery that measured 766 different clinical outcomes, with no single outcome reported in all [117]. Even considering a seemingly simple outcome such as mortality, there were over 84 different ways in which this was defined and measured. The same problem has been highlighted in studies of oesophageal cancer surgery [118], where a review of 122 articles reported 210 unique complications and 10 different measures of operative mortality, and breast reconstruction following mastectomy for cancer [119], which identified 134 studies reporting 950 unique complications. Problems with the multiplicity of PRO measures have also been described previously in oesophageal surgery [120], but there is no evidence of this issue in trials of CRC surgery.
This study is the largest systematic review of PROs in CRC and was conducted with rigorous methodology, but there are some limitations. The review covers published CRC studies in English up until 2010. A more exhaustive search over a more recent period of time, or inclusion of unpublished data or non-English publications may have yielded further PROs, but all the most commonly used PROs were captured by these inclusion criteria and extending the review would have probably only identified additional rare PROs. The categorization process could be criticized as arbitrary, but efforts were made to objectify the process. First, two researchers categorized the questionnaire items independently, each blinded to the other. Second, categorization was checked for face validity by a patient representative. Finally, there has been full disclosure of the categorization in this article to allow scientific scrutiny of the process.
Having identified all the potential patient reported health domains measured in CRC surgical studies, the next phase of this research is to gain a consensus on which outcomes it is essential to measure in all trials. Recommended methods to achieve this have been defined by the international Core Outcome Measurement in Effectiveness Trials (COMET) group [6]. Domains will be combined with clinical outcomes generated from a previous systematic review [117] to create  O224 Synthesis and summary of PROMs an exhaustive long-list of all potential outcomes. Key stakeholders, including patients and professionals, will then consider the importance of these outcomes and undertake a prioritization exercise called the Delphi process. This will allow the outcomes of lesser importance to be discarded from the core set. Finally, when the number of outcomes has been reduced, face-to-face meeting will be conducted to allow for debate about their relative merits before the final core set is agreed.
In conclusion, this systematic review of CRC surgery demonstrated significant heterogeneity of PRO measurement that may hinder comparisons between studies, limit meta-analysis and allow outcome reporting bias. A long-list of patient reported 'health domains' was generated using robust methodology to inform the development of a core outcome set.