Surgery for constipation: systematic review and clinical guidance

This manuscript provides the introduction and detailed methodology used in subsequent reviews to assess the outcomes of surgical interventions with the primary intent of treating chronic constipation in adults and to develop recommendations for practice.

Method PRISMA guidance was adhered to throughout. A literature search was performed in public databases between January 1960 and February 2016. Studies that fulfilled strictly-defined PICOS (patients, interventions, controls, outcome, and study design) criteria were included. The process involved two groups of participants: (i): 'a clinical guidance group' of 18 UK experts (including junior support) who performed the systematic reviews and produced summary evidence statements (SES) based strictly on data synthesis in each review. The same group then produced prototype graded practice recommendations (GPRs) based on coalescence of SES and expert opinion; (ii): a European Consensus group of 18 ESCP (European Society of Coloproctology) nominated experts from nine European countries evaluated the appropriateness of each prototype GPR based on published RAND/UCLA methodology.
Results An overview of the search results is provided in this manuscript. A total of 156 studies from 307 full text articles (from 2551 initially screened records) were included, providing data on procedures characterized by: (i) colonic resection (n = 40); (ii) rectal suspension (n = 18); (iii) rectal wall excision (n = 44); (iv) rectovaginal septum reinforcement (n = 47); (v) sacral nerve stimulation (n = 7). The overall quality of evidence was poor with 113/156 (72.4%) studies providing only Oxford level IV evidence. The best evidence was extracted for rectal excisional procedures, where the majority of studies were Oxford level I or II. The five subsequent reviews provide a total of 99 SES (reflecting perioperative variables, efficacy, harms and prognostic variables) that contributed to 100 prototype GPRs covering patient selection, procedural considerations and patient counselling. The final manuscript details the 85/100 GPRs that were deemed appropriate by European Consensus (remaining 15 were all uncertain) and future research recommendations.

Introduction
Constipation is common in adults and children with up to 20% of the population reporting symptoms depending on the definition used (2-28% adults; 0.7-30% children) [1][2][3]. Chronic constipation (CC), usually defined as more than 6 months of symptoms, is less common but results in 0.5 million UK GP consultations per annum. A proportion of the population suffer symptoms that are both chronic and more disabling (probably about 0.4% population) [4]. Such patients, who are predominantly female [5], are usually referred to secondary care with many progressing to tertiary specialist investigation. Patient dissatisfaction is high in this group; nearly 80% feel that laxative therapy is unsatisfactory [6] and the effect of symptoms on measured QOL is significant [7]. CC consumes significant healthcare resources. In the US in 2012, a primary complaint of constipation was responsible for 3.2 million physician visits resulting in (direct and indirect) costs of $1.7 billion [8]. In the UK, it is estimated 10% of district nursing time is spent on bowel control [9] and the annual spend on laxatives exceeds £117 m, with 18.3 million prescriptions in 2014 of which 91% were for stimulant and osmotic laxatives (Health and Social Care Information Centre) [10].
The act of defaecation is dependent on the coordinated functions of the colon, rectum and anus. Considering the complexity of neuromuscular (sensory and motor) functions required to achieve planned, conscious, and effective defaecation [11], it is no surprise that disturbances to perceived 'normal' function occur commonly at all stages of life. Clinically, such problems commonly lead to symptoms of obstructed defaecation e.g. straining; incomplete, unsuccessful or painful evacuation; bowel infrequency; abdominal pain and bloating. After exclusion of secondary causes (obstructing colonic lesions, neurological, metabolic and endocrine disorders), the pathophysiology of CC can broadly be divided into problems of colonic contractile activity (and thus stool transit) and problems allied to rectal emptying (evacuation disorder). A combination of clinical expertise and specialist radio-physiological investigations can determine which patients have slow colonic transit, evacuation disorder, both (in whom transit is usually characterized by a left-sided delay) or neither (no abnormality found with current tests) [12]. Evacuation disorders can be further subdivided into those with a structurally significant pelvic floor abnormality (usually as a consequence of pelvic floor weakness or injury) e.g. rectocoele or internal prolapse (intussusception), and those characterized by a dynamic failure of evacuation without structural abnormality: most commonly termed 'functional defaecation disorder (FDD) [13]' (Fig. 1).
The management of CC is a major problem due to its high prevalence and lack of widespread specialist expertise. In general, a step-wise approach is undertaken, with first line conservative treatment such as lifestyle advice and laxatives (primary care) followed by nurse-led bowel re-training programs, sometimes including focused biofeedback and psychosocial support (secondary/tertiary care). Although these treatments may improve symptoms in more than half of patients [14], patients with intractable symptoms and impaired QOL may subsequently be offered a range of surgical interventions.
Surgical decision-making is greatly influenced by local expertise, commissioning or reimbursement, and personal enthusiasm for particular interventions. While robust diagnosis of specific pathophysiologies combined with multidisciplinary team discussion may help direct surgery, in the absence of an agreed pathway to stratify patients, there is a current large and difficult-to-justify variation in surgical practice that continues to risk inadequately-informed and potentially harmful interventions being offered. The need to reduce such variations in practice, based on available evidence, has been a recurrent theme of recent national specialty group discussions (e.g. ACPGBI) with various initiatives proposed. As part of the Chronic Constipation Treatment PathwaY (CapaCiTY) programme funded by National Institute of Health Research (NIHR), a multi-disciplinary working group was convened in July 2014 to address this need. This group of medical and nursing experts included members of The Pelvic Floor Society and urogynaecology expertise derived from the International Continence Society (ICS). As a prelude to developing new evidence from trials within the CapaCiTY programme, it was agreed that the current surgical evidence base would benefit from coalescence in the form of systematic review and graded practice recommendations. This paper and the accompanying subsequent six papers address this aim.

Protocol and registration
The authors developed the protocol for review, detailing pre-specified methods of the analysis and eligibility for the review in accord with 2009 PRISMA guidance [15] using also the new reporting elements derived from the 2016 harms checklist [16]. While the protocol was not registered, a description of the NIHR Capa-CITY programme is available in the public domain (http://www.isrctn.com/ISRCTN11747152) and has been presented nationally (DDF meeting, London 2015; National Pelvic Floor Meeting, Manchester 2015).

Eligibility criteria
Study characteristics Study characteristics were defined using the PICOS framework. Search term definitions were inclusive, promoting a sensitive search of studies reporting surgical interventions for chronic constipation. Population: The review aimed to identify studies of patients undergoing surgical interventions with the primary intent of treating chronic constipation. The definition of chronic constipation is neither straightforward nor uniformly applied [17]. On this basis, all common terms encompassing problematic defecation were used (see search strategy syntax: Appendix I). However, several pelvic floor procedures may be performed commonly for non-chronic constipation indications. Examples include pelvic organ prolapse syndromes where the physical prolapse or other organ dysfunctions of the vagina or bladder are the main motivation for surgery. While such patients invariably also have some degree of defaecatory problems, and their perioperative data could still be used to inform procedural safety, these patients may phenotypically differ at baseline and in response to surgical intervention even if the intervention itself is identical or at least similar. Cochrane reviews such as 'surgical repair of pelvic organ prolapse in women' [18] and of surgical management of external rectal prolapse [19] include some RCTs where defaecatory symptoms are recorded as a secondary outcome or as a complication but not as a primary presenting complaint of the population studied. Thus these were ineligible for inclusion. Similarly, for colonic excisional procedures, patients with the very rare diagnoses of adult Hirschsprung disease or idiopathic megacolonmegarectum [20] were considered distinct from chronic constipation and thus not included. Some studies reported outcomes on two populations, only one of which was eligible e.g. internal and external rectal prolapse. Where such data could not be separated by population, the study was also deemed ineligible for inclusion.
A minimum population sample of 20 patients was imposed for eligibility. This threshold was taken to exclude case reports and small case series that often reported a single surgeon's personal experience or early experience of experimental procedures.
Intervention: Surgical procedures for chronic constipation are subject to heterogeneous descriptions. On this basis, an iterative approach was taken by cross referencing e.g. with textbook reference lists to ensure that all terms in common usage were incorporated in the eventual search strategy. These included some genuine procedural variations but also multiple small changes in syntax for the same procedure e.g. 'stapled transanal rectal resection' vs 'stapled transanal rectum resection'. A decision was taken by the review team that results would be grouped by five main approaches to surgically treating chronic constipation: (i) colonic resection, (ii) hitching procedures of the rectum (rectal suspension); (iii) excisional procedures of the rectal wall (rectal excision); (iv) reinforcement of the rectovaginal septum (RV reinforcement); and (v) sacral nerve stimulation (SNS). This approach was taken because initial review (Oct 2014) determined that other procedures either lacked sufficient evidence for review. The first major exclusion on this basis were stomas leading to intestinal discontinuity or for the purpose of administering bowel irrigation (continence enema). It is acknowledged that in the real world many patients have stomas either deliberately or as an eventual outcome of other surgery. However, eligible studies were sparse after application of inclusion criteria and markedly heterogeneous between and within studies (patients and techniques). Other procedures were excluded if still considered experimental e.g. colonic exclusion procedures [21].
Comparisons: Studies were eligible regardless of whether they were retrospective or prospective in design, controlled or uncontrolled. Only a minority of studies reported more than one procedure or more than one population.
Outcomes: Studies were broadly eligible if they provided extractable data on benefit (treatment efficacy), risk (harms) or both. For efficacy, inclusion necessitated the acceptance of the huge disparity in quality of outcomes reporting that are well acknowledged in the literature [14], with a heavy reliance on estimates of global patient satisfaction with the procedure (an indirect measure of the patients own judgement of their postoperative state compared to their pre-operative state). Studies of physiological and anatomical outcomes alone were excluded since these are generally regarded as a poor surrogate of efficacy in this patient population [22]. Because the outcomes of surgical interventions for chronic constipation are known to exhibit a 'honeymoon period' in the months immediately following surgery, a minimum (mean or median) follow up of 12 months was applied for eligibility. It is acknowledged that enforcement of this criteria excluded some level I studies. Several studies reported the outcomes of more than one procedure. Where such data could not be separated by procedure, these were not included (often resulting in study ineligibility).

Report characteristics
Year of publication: Any publication date was eligible as covered by database search from 1960 to the date of final search (22nd February 2016).
Language: Due to the large number of studies retrieved, it was decided to include only studies with full text in the English language. While the numbers of foreign language studies were small, these have been detailed for the reader in 'reasons for exclusion' at the full-text stage (rather than at the abstract screening stage). There is reasonable evidence to suggest that searching only in English does not have an adverse effect on the quality of systematic reviews [23].
Type of study: Only peer-reviewed publications reporting primary data were eligible. Thus reviews, editorials, letters and other forms of secondary expert opinion were excluded at the screening stage. Only full manuscripts were eligible thus conference abstracts and proceedings were also excluded. No constraint was imposed based on level of evidence. This decision was taken in the knowledge that the vast majority of data would be extracted from case series rather than higher quality study types.

Information sources
The senior author (CK) performed a comprehensive search of the literature on 22nd February 2016 using PubMed and Evidence Based Medicine reviews (including the Cochrane database of systematic reviews and the Cochrane central register of controlled trials). A preliminary search in 2014 had determined that Embase and Web of Science led to almost 2000 duplicate records with no additional yield. Search terms used a sensitive combination of population, intervention and report terms. A keyword and hand search was used within relevant Cochrane systematic reviews. The specific search terms are listed in Appendix I.

Study selection
Screening was performed at the abstract level by the senior author (CK), excluding studies not meeting eligibility criteria where this could be readily determined from the abstract alone. Full-text copies of all remaining English language studies were obtained and assessed by reviewers, who were un-blinded to the names of studies, authors, institutions or publications. Disagreement regarding inclusion was resolved by the senior author (CK). Duplicate data sets generated from the same cohort of patients were excluded with the larger population size and longer follow-up cohort included at the expense of earlier reports from the same cohort. In instances of doubt, authors from the relevant institutions were contacted to confirm or refute any repetition of results (performed on three occasions).
Search results were cross-referenced to bibliographies from other sources (previous reviews and book chapters). Care was taken that any studies missed by the original search met the strict inclusion criteria and did not circumnavigate the carefully-defined search strategy especially in relation to population terms.

Data collection process
Outcome data were extracted by the junior authorship team (UG, EJH, DP, PFV) paired with one senior author for each procedure: colonic resections (CK); rectal hitching procedures (SB); rectovaginal septum reinforcement (ABW); rectal wall excision (MM-J); sacral nerve stimulation (SP). Data were extracted to a standardized template (Microsoft Excel spreadsheet) including study characteristics and outcome data (see below). For each procedure, one reviewer extracted the data and one verified content.

Data items
A full list of data fields is included in Table 1 (with annotation). These followed the PICOS framework with outcomes broadly divided into those assessing harms (intra-and perioperative complications and long-term adverse outcomes), and those assessing efficacy: global success ratings and functional outcomes (organized into validated symptom, QOL scoring instruments and individual symptoms). For perioperative complications, some consideration was given to classifying complications by established systems e.g. Clavien-Dindo however inconsistencies in reporting made this unfeasible. Data were not collected in relation to cost effectiveness which was deemed to fall outside the remit of the process aims. To simplify data extraction and presentation, for ordinal data, summary statistics were extracted as mean or median (with SD when provided).

Individual study quality and risk of bias
The methodological quality of all individual included studies was assessed by the senior author (CK) and classified in accord with Oxford CEBM levels of evidence definitions for 'therapy or harm' [16]. The following rules were applied accepting that distinguishing study designs can be problematic for observational studies [24]: 1 A study was deemed prospective if this was categorically stated or if patients were 'enrolled' or 'recruited' to a study that systematically recorded preand post-operative data. All other studies were assumed to be retrospective. 2 A cohort study was defined as one designed to address a clear stated aim or hypothesis using specified analytical methods. In general, these included a comparison group related either to the relative efficacy of more than one specified procedure or to patient selection where a specified baseline 'risk factor' was analysed in relation to relative success or failure of the intervention. Further sub-analysis of the quality of observational studies (e.g. compliance with STROBE, Newcastle-Ottawa or MINORS) was not undertaken as it was felt that this would add little to the overall assessment of quality.

Summary measures
Results were tabulated by outcome and described with appropriate summary statistics (percentages, means and ranges). For very rare events, the aggregate number and denominator were reported. Quantitative data synthesis was performed for key outcomes using meta-analysis in STATA SE v14. Pooled proportions and means were estimated, permitting exploration of heterogeneity and bias. Where continuous measures failed to report measures of variance these were approximated as range/4. Random effect meta-analytic models were estimated to characterise rates of events and heterogeneity between studies, with sub-grouping by procedure. Where studies did not provide data in a useful summary form, available data were tabulated but not included in the meta-analysis. Results were presented as aggregate means with confidence intervals and graphically displayed within Forest plots. For pooled studies, the I 2 value (reflecting intra-group heterogeneity) was reported and interpreted in accord with published guidance where 0-40% = heterogeneity might not be important, 30-60% = moderate heterogeneity, 50-70% substantial heterogeneity and 75-100% = considerable heterogeneity [25]. The magnitude and direction of effect, and strength of evidence P-value from the chi-squared test, were used to interpret the importance of heterogeneity.
Evidence within reviews was predominantly provided by observational cohort data with relatively few experimental studies (trials) identified. Consequently, the reviews analyse all studies as individual cohorts, by procedure, to achieve inclusion and consistency; pooled findings are compared with the findings of individual trials. Where several trials were identified within a review (e.g. rectal excision procedures) meta-analyses was performed with sub-grouping by procedure and by evidence grade. Findings by evidence grade were reported only when they deviated qualitatively from the overall pooled summary. Given the nature and reporting of data, study-level meta-regression was not attempted.

Risk of bias across studies
Publication bias was assessed for outcomes where metaanalysis was performed. Other limited analysis was performed based on study size, design and publication date where this contributed to interpretation. Subgroup analysis was explored for the main procedural variations.

Development of summary evidence statements
Summary evidence statements were produced by the Clinical Guidance Group (CGG). This group was convened in summer 2014. A final list of participants was selected primarily from colorectal surgeons, gastroenterologists, urogynaecologists and specialist nurses with a strong interest in functional colorectal and pelvic floor disorders. This group included all senior authors of the five reviews and associated junior investigators. Methodological expertise was provided by Professor James Mason (University of Warwick), and NHS Specialised Services stakeholder representation by Mr Mark Chapman. A series of meetings followed (Bristol, November 2014; London, June 2015; Manchester, November 2015; and Edinburgh, July 2016) at which the evolving summary evidence statements (from reviews) were eventually ratified and prototype clinical practice recommendations drafted.
The CGG used 'focus group' methodology to gain consensus by in silico and face to face meetings. The number of participants (> 12), and four rounds of written revisions fulfilled the basic criteria required for a guideline decision group (National Institute for Health and Clinical Excellence, April 2007) and allowed a sufficiently reliable process at an acceptable cost in terms of travel, expenses etc. The heterogeneity of the group (specialty, nationality, expertise) was deemed desirable to be representative of a range of stakeholders. Agreement was defined without 'weighting' of any participant's views, although some participants contributed more than others to the process.
Using the synthesis of the evidence base the group drafted statements of evidence based on best evidence available (which varied significantly by procedure). The clinical guidance group discussed, revised and graded summary statements of evidence level using the Oxford 2009 CEBM system (http://www.cebm.net/oxford-ce ntre-evidence-based-medicine-levels-evidence-march-2009) ( Table 2) based on the review of evidence. For clarity, roman numerals (I-IV) were used to denote summary levels of evidence for graded evidence in contrast to Arabic numerals for individual studies, e.g. 1a, 2b etc. Summary levels could apply either positively or negatively to each procedure. Care was taken to avoid any contamination of expert opinion into statements, these thus solely reflecting summated evidence from systematic review. Some language used in summary evidence statements was deliberately chosen to reflect use of pooled data. Thus the term 'typical' or 'typically' specifically denotes that data for the event in question have been derived from random effects analysis.

Development of graded practice recommendations (GPRs)
This had two main stages: (i) development of 'prototype' GPRs by the Clinical Guidance Group, and (ii) development of a final GPR list by a European Consensus group. This approach, including the methodology used (RAND/UCLAsee below) is established and has been used previously in the coloproctology field [26].
Development of prototype GPRs: After a common understanding of the evidence was established, group discussion balanced clinical experience and evidence summaries to arrive at shared judgements about recommendations for care, thus deriving relevant recommendations for decision making in clinical practice. Group processes risk personal bias based on 'eminence' or 'eloquence' if led and supported ineffectively: adequate methodological support in the use of evidence and dialectic was provided to support the process to ensure a balance of views as well as to promote generalizability and impact. This stage embodied summary evidence statements (from each review), data from some excluded level I studies (e.g. RCTs that were excluded for short follow up or published after the review date) (a further search was run by CK on 03.10.16 for the date range 22.02.16 to 03.10.16 including original terms and 'clinical trial') and expert opinion derived from the decision group and selected prior published guidance documents (Oxford 5) (Fig. 2).
Final grading followed Oxford CEBM recommendations (A-D) [27] [ Table 3]. As with levels of evidence the grades of evidence could apply either positively or negatively to the procedure.
Development of final GPRs: The European Consensus group comprised a panel of European experts (colorectal and pelvic floor surgeons) nominated by the European Society of Coloproctology (ESCP). Twenty experts were invited from 10 European countries of whom 18 participated from nine countries (Appendix II).
Consensus methodology was derived from the RAND/UCLA Appropriateness Method (Prepared for Directorate General XII, European Commission 2001) [28]. Prototype Graded Practice Recommendations (derived from the clinical guidance group) were presented (on a spreadsheet) for each procedure under three subheadings: 'patient selection', 'procedural considerations' and 'patient counselling'. For each, a number of GPRs were listed, each with associated levels of evidence and grade of prototype recommendation. For each, consensus panellists were asked 'Does this recommendation lead to an expected health benefit that exceeds the expected negative consequences of its introduction?' Examples of health benefits in this context could be improved surgical outcome, improved patient experience, improved functional capacity etc.; the negative consequences could include increased morbidity, anxiety, pain, time lost from work, denial of an investigation or treatment. Panellists were asked to base their judgement on clinical grounds only, i.e. exclusive of financial cost [29].
Responses to each listed recommendation used a linear analogue scale of 1-9 to assess views on the benefitto-harm ratio. Using this scale, a score of 1-3 indicated that they expected the harms of introducing the recommendation to greatly outweigh the expected benefits and a score of 7-9 that the expected benefits greatly outweighed the expected harms. A middle rating of 4-6 could mean either that the harms and benefits were considered about equal or that the panellist was unable to make a judgement for the recommendation. Panellists were asked to try and provide a response for all listed recommendations.
Responses were analysed in accordance with the first phase of RAND/UCLA guidance, with each recommendation classified as 'appropriate,' 'uncertain' or 'inappropriate' according to the panellists' median score and the level of disagreement. Indications with median scores in the 1-3 range were classified as inappropriate, those in the 4-6 range as uncertain, and those in the 7-9 range as appropriate. All indications rated 'with disagreement,' whatever the median, were classified as uncertain. 'Disagreement' here basically implied a lack of consensus, either because of polarisation or spread over the entire scale (defined for a sample of 18 panelists as > 5 rating the indication outside the 3-point region [1][2][3][4][5][6][7][8][9]28]). Further phases of consensus following discussion to reduce variation were not conducted.

Summary research recommendations
One of the initial drivers for this process (NIHR Capa-CiTY) was the need to define the main evidence needed for future surgical trials of patients with CC. During the development of this guidance, some trials have commenced patient recruitment such as CapaCiTY study 3 (RCT of laparoscopic ventral rectopexy). There is however still a great need to define research questions that could inform future UK and international commissioning of research funding. Research recommendations *No studies of these designs found by search for any procedure. †But where a recommendation was considered necessary to highlight the absence of evidence for an important practice point. have been attributed a priority (high, medium or low) based on the expert opinion of the current working group and may help inform discussion about future funding priorities.

Presentation of results
In view of the large scale of the systematic review and prototype guidance process, results have been presented as a series of separate manuscripts: 1 Overview of search results and study characteristics (this manuscript); 2 Systematic review results and summary evidence statements for colonic resection; 3 Systematic review results and summary evidence statements for procedures characterized by rectal suspension; 4 Systematic review results and summary evidence statements for procedures characterized by rectal wall excision; 5 Systematic review results and summary evidence statements for procedures characterized by rectovaginal septum reinforcement; 6 Systematic review results and summary evidence statements for sacral nerve stimulation; 7 Coalescence of systematic review data, summary of graded practice recommendations and research recommendations.
The main conclusions of this process were presented at the Pelvic Floor Society Meeting in Cardiff, January 2017.  Study characteristics Table 3 gives information on the overall study characteristics and by procedure. Detailed data on individual reviewed studies are provided by procedure type in the accompanying papers. It can readily be noted that the overall quality of evidence was poor with 113/156 (72.4%) providing only level IV evidence. The best evidence to date exists for rectal excisional procedures where the majority of studies where level I or II. This is discussed further in the final graded practice recommendations and research recommendations paper.