Systematic review: outcomes and adverse events from randomised trials in Crohn's disease

Summary Background The suitability of disease activity indices has been challenged, with growing interest in objective measures of inflammation. Aim To undertake a systematic review of efficacy and safety outcomes in placebo‐controlled randomised controlled trials (RCTs) of patients with Crohn's disease. Methods MEDLINE, EMBASE, CINAHL and Cochrane Library were searched until November 2015, for RCTs of adult Crohn's disease patients treated with medical or surgical therapies. Data on efficacy and safety outcomes, end‐point definitions, and measurement instruments were extracted and stratified by publication date (pre‐2009 and 2009 onwards). Results One hundred and eighty‐one RCTs (110 induction and 71 maintenance) were identified, including 23 850 patients. About 92.3% reported clinical efficacy endpoints. The Crohn's Disease Activity Index (CDAI) dominated, defining clinical response or remission in 63.5% of trials (35 definitions of response or remission). CDAI < 150 was the commonest endpoint, but reporting reduced between periods (46.4%‐41.1%), whilst use of CDAI100 increased (16.8%‐30.4%). Fistula studies most commonly reported fistula closure (9, 90.0%). Reporting of biomarker, endoscopy and histology endpoints increased overall (33.3%‐40.6%, 14.4%‐30.4% and 3.2%‐12.5%, respectively), but were heterogeneous and rarely reported in fistula trials. Patient‐reported outcome measures were reported in 41.4% of trials and safety endpoints in 35.4%. Many of the common adverse events relate to disease exacerbation or treatment failure. Conclusions Trial endpoints vary across studies, over time and are distinct in fistula studies. Despite growth in reporting of objective measures of inflammation and in patient‐reported outcome measures, there is a lack of standardisation. This confirms the need for a core outcome set for comparative effectiveness research in Crohn's disease.


| INTRODUCTION
Defining the key outcomes of therapeutic interventions and the best way to measure those outcomes is essential for clinical and regulatory decision-making. Due to the complexity of Crohn's disease and the multitude of treatments, a number of different outcomes and outcome measures have been reported in clinical trials including symptom scores, composite disease activity indices and quality of life questionnaires. 1,2 Decision-making also relies on the availability of good information on the unintended effects (harms) from treatments.
Heterogeneity in reporting of outcomes or measurement instruments within clinical trials may hinder the comparison of results within systematic reviews and inhibit the meaningful interpretation of individual studies. 3 One way to mitigate this problem is the introduction of an agreed minimum set of standardised outcomes, to be measured and reported in all trials for a particular condition, referred to as a core outcome set. 4 There is no core outcome set for Crohn's disease, although a model has been proposed for classifying outcomes for all inflammatory bowel diseases using the World Health Organisation International Classification of Functioning, Disability and Health (ICF). 5 Recently, the International Consortium for Health Outcomes Measurement developed a "Standard Set" for inflammatory bowel disease with recommendations for the pragmatic measurement of outcomes in routine care to support benchmarking. 6 Also recently published is a study protocol for the development of a core outcome set for inflammatory bowel disease 7 and a core outcome set for fistulising Crohn's disease, 8 indicating the importance of this research area. Future trial design and core outcome set development for Crohn's disease would benefit from a systematic synthesis of outcome reporting across published clinical trials, incorporating statistical testing and consideration of adverse events.
In this study, we systematically reviewed the literature to extract data on the outcomes and measurement instruments used, and the safety outcomes reported, in randomised clinical trials (RCTs) of treatments for Crohn's disease. Our aims were to explore the extent of heterogeneity among existing trials, to examine time trends in reporting and to generate insights to support future trial design and core outcome set development. Our results extend beyond the recently published literature in this area by including a broader set of interventions, offering statistical testing of time trends in outcome reporting and bringing new evidence on harms reporting in Crohn's disease. 8,9 2 | ME TH ODS

| Systematic search
We registered review protocols with the International Prospective term "Crohn's disease" and the key word "outcome" were used. See Tables S1 to S4 for detailed search criteria.

| Eligibility criteria and study selection
Randomised control trials of drug therapies (corticosteroids, 5-ASAs, immunosuppressants, biologics and antibiotics), surgery and nondrug therapies (enteral nutrition, complementary and alternative medicine, probiotics and prebiotics) were included, as were RCTs of treatments for complications (strictures, fissures, abscesses and perforations). Eligible trials were conducted in adult patients (aged 18 or over) with Crohn's disease. Studies of inflammatory bowel disease populations were eligible provided outcomes were reported separately for Crohn's disease. Studies had to be published as full text in English.
Duplicates were removed after a complete list of RCTs was generated. Two reviewers (HC and JK) independently assessed the sample of 100 studies against eligibility criteria at the title and abstract screening stages and resolved discrepancies by discussion. A random sample of 100 was selected for review due to time constraints. The sample was generated by assigning each article a number and using a random number generator. There were no issues found when screening the 100 articles and the primary researcher (HC) screened the remaining papers independently. Full copies were obtained of all potentially eligible studies and reassessed against eligibility criteria by the primary researcher (HC). Reference was made to the second reviewer (JK) where needed.

| Data collection
Data were extracted from the studies by the primary researcher. A randomly generated sample of 10 studies were reviewed and data extracted by the primary researcher and the secondary researcher (JK) checked the extraction. No inaccuracies were found in the data extraction of the sample of 10 papers and the primary researcher extracted data from the remaining papers independently. Studies were categorised as induction or maintenance with subcategories of medical vs surgical induction and maintenance of medically induced vs surgically induced remission. RCTs focusing solely on patients with fistulising disease were flagged to identify differences in reported outcomes. Efficacy and safety outcomes were recorded as reported as primary or secondary outcomes, or not specified as either. The efficacy outcomes were categorised in line with the method used by Ma et al 10 as clinical or composite-clinical, endoscopic, histologic, biomarkers and patient-reported outcomes (PROs).
Safety-related outcomes were recorded as primary or secondary outcomes.
Adverse event reporting was recorded in specific categories: adverse events, serious adverse events, treatment-related adverse CATT ET AL. | 979 events, treatment-related serious adverse events, study withdrawal, abnormal laboratory results and adverse events by preferred term according to the Medical Dictionary for Regulatory Activities (Med-DRA). 11 Study withdrawals were categorised as due to adverse events, serious adverse events, treatment-related adverse events, treatment-related serious adverse events, treatment failure (insufficient therapeutic effect, exacerbation of Crohn's disease, development of complications or need for additional therapy, surgery or hospitalisation) or other reasons (protocol noncompliance, lost to follow-up, prohibited medicine use or withdrawal of consent).
A critique of the methodological quality of the studies was unnecessary, as this project did not involve synthesis of outcome data.

| Synthesis of results and analysis
A comprehensive record of efficacy and safety outcomes was generated and organised by outcome type. Our main analysis of efficacy outcomes focused on those designated as primary or secondary endpoints. We adopted a similar approach for safety-related outcomes but also analysed all reported data for adverse events and study withdrawals. Adverse event reporting was considered at two levels of the MedDRA hierarchy: system organ classification (SOC) and higher level group term, the latter of which is considered a clinically relevant grouping of MedDRA preferred terms. 11 Adverse events were grouped by MedDRA higher level group terms and ranked in the order of frequency of reporting. The top 10 ranked higher level group term adverse events were compared by trial type and drug class.
A secondary analysis considered the reporting of outcomes were not specified as primary or secondary endpoints. To mirror the increased focus on the importance of mucosal healing, 12 the number of studies that reported additional endoscopic or histologic outcomes or the faecal calprotectin biomarker was assessed.
The proportion of studies reporting each type of outcome was calculated, by trial type. The results were stratified by into pre-2009 and 2009 onwards and the changes over time in reporting were summarised in matrix form with outcome categories listed in rows and frequency of outcome reporting plotted in greyscale on a time axis. 10 The statistical significance of any changes between time periods in outcome reporting was tested using the chi-squared test (with 1 df, the critical value of chi is 3.84).
The review was reported in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and harms the checklist. 13,14 3 | RESULTS

| Endoscopy
The reporting of endoscopic outcomes doubled between the two time periods, from 14.4% to 30.4% of studies ( Figure 2A). This increase was statistically significant with a chi-squared value of 6.31 (95% confidence level). Endoscopic outcomes were reported in 31% (22)  Endoscopic outcomes were infrequently reported in induction trials (13,11.8%) and in trials in penetrating disease (1, 10.0%). 120 Reporting of endoscopic outcomes is a more recent phenomenon in induction trials, with their first use in a study reported in 2000, as compared with 1984 in maintenance trials.

| Histology
Histology-based outcomes have shown a statistically significant increase between the two periods (chi-squared test statistic of 5.86) ( Figure 2A), but remain uncommonly used (11, 6.1%) (Table S8). The reporting of histologic outcomes as additional outcomes increased between the time periods from 3.2% of studies to 7.1%, but this is not statistically significant at the 95% confidence level. Other tools for measuring quality of life included the Short-Form 36 40,50,106,120,121,153,160,161,169,173 and its components, 50,121,169 Patient Global Assessments, 48 Adverse events were the most common primary and secondary outcomes, reported in 39 (35.5%) induction and 22 (31%) maintenance studies. The reporting of adverse events as a primary or secondary endpoint was most frequently the totality of adverse events but some studies looked for specific treatment-related adverse events or reported the stopping of treatment due to adverse events.

| Adverse events
Reporting of any adverse events occurred in 88

| Adverse events by intervention group
Five of the 10 most commonly reported adverse event groups for all therapies were also in the top 10 across all intervention groups (Table 3)

| Adverse events by drug class
Gastrointestinal signs and symptoms, and infections were the only two adverse event groups that were consistently ranked in the 10 most commonly reported across all drug classes (including CAM, dietary and prebiotic/probiotic interventions) ( Table 4)  HLGT reported in equal numbers but only in post-operative maintenance trials: hepatobiliary investigations.
T A B L E 4 Ten most commonly reported MedDRA higher level group terms in randomised controlled trials in Crohn's disease, by drug class

SOC HLGT
All rank (ranked first to third most common), are the fourth most common adverse event for surgical interventions, along with headaches and a number of other adverse event groups (Table 3).
correlate closely with objective signs of inflammation or with mucosal healing at endoscopy. 198,199 The time trends we observed in clini- Stool biomarkers offer potential to reliably measure gut-related inflammation and in recent years faecal calprotectin has become available in routine IBD practice. 203 Uncertainty remains as to its performance properties particularly for measuring small bowel, rather than colonic, disease activity 204 and research continues to explore other stool assays to measure the inflammatory process. 205 Faecal calprotectin was reported as an endpoint in only two trials included in this review. 66,101 We found a statistically significant increase in the report of endoscopy and histology-based outcome measures over time, albeit they remained at a low level and without emergence of a standardised approach. This heterogeneity likely reflects the current suboptimal psychometric properties of individual measurement tools, both for endoscopic and histologic scoring systems. 206,207 In addition to the cost and invasiveness of ileocolonoscopy, endoscopy is not able to fully characterise small bowel disease or quantify the overall extent of intestinal inflammation in Crohn's disease. There is a growing body of research on the potential use of quantitative imaging such as CT and MRI, 208  Nevertheless, these data demonstrate differences in the adverse event profile of different intervention groups and should support renewed attempts to define disease-and intervention-specific adverse events and to standardise safety outcomes as discrete endpoints. This is an important consideration for future core outcome set developers.
Our results highlight how the reporting of outcomes in trials in fistula patients align with overall reporting. The use of PROMS and safety-related endpoints is common across all trials, regardless of disease type. Clinical response was less commonly measured by CDAI, and more frequently measured by fistula closure and the PDAI.
These three outcome measures were the most commonly used in fistula trials identified by this review, which supports the findings of a recently developed core outcome set for fistulising disease. 8 Biomarker, histology and endoscopy outcomes were rarely used in fistula trials and are not included in the core outcome set either, contrary to the general shift in outcomes reporting in Crohn's disease trials. However, patient reports (eg incontinence and drainage) were more common endpoints in trials of fistula patients than in nonfistula trials, and their importance is borne out in the core outcome sets, which lists several PROMs to be reported in future trials.
Our review independently supports the key findings of a recently published systematic review of outcomes in Crohn's disease. 9 We confirm heterogeneity in definitions of response and remission and the need for a core outcome set to standardise endpoint definitions.
Both studies identified the use of CDAI as the most popular outcome measurement tool overall and of IBD-Q as the most commonly The use of CDAI as a requirement for trial inclusion in their systematic review reduces the ability of the Ma et al review to assess changes in the use of CDAI. We have been able to include such analysis in our paper, and confirm a statistically significant increase in CDAI100, whereas the use of CDAI overall has remained relatively consistent.
Our study has limitations. Whilst it includes a comprehensive listing of outcomes from available Crohn's disease trials, we cannot account for publication bias. The results would have been strengthened by the consideration of nonrandomised controlled trials and observational studies. In particular, this would help to characterise important longer term harms. We did not assess the validity or reliability of the outcome measures identified in the review, although this would form a part of any core outcome set development process.
Our study confirms the variability that exists in reporting of outcomes in published clinical trials of interventions for Crohn's disease.
These data provide a comprehensive resource to support current efforts 7 to redefine optimal outcomes and measurement tools to be included in future studies of comparative effectiveness.

ACKNOWLEDGEMENT
Declaration of personal interests: None.

AUTHORSHIP
Guarantor of the article: K.B. is guarantor of the article.