Non-randomized studies should be considered for assessing surgical techniques in rectal prolapse: prospective cohort study

Aim Randomized trials comparing surgical techniques for rectal prolapse are not always feasible. We assessed whether non-randomized comparisons of those who have had surgery with those still waiting would be confounding baseline health status. Method This was a prospective cohort study in seven UK hospitals. Participants were ≥ 18 years and listed for surgical interventions of equivalent intensity for rectal prolapse. They were deﬁned as short or long waiters ( ≤ 18 or > 18 weeks, respectively). Time on the waiting list was compared with baseline comorbidity (Charlson comorbidity index) and change from baseline in health status (EQ-5D-5L) at the time of surgery. Results In all, 203 patients were analysed. Median (in-terquartile range) waiting time was 13.7 weeks (8.1, 20.4) varying across sites. Baseline comorbidity was not an important predictor of waiting time. Median Charlson comorbidity index was 2 (0, 3) for short and 1 (0, 3) for long waiters. A change in waiting time by a week was associated with negligible improvement in the EQ-5D-5L index of 0.001 (95% CI − 0.000 to 0.003, P = 0.106). Conclusion Negligible change in patient reported health status while on the waiting list and lack of effect of comorbidities in inﬂuencing waiting time support the use of non-randomized pre- / post-studies to compare the effects of surgical interventions for rectal prolapse.


Introduction
In randomized controlled trials (RCTs), we aim to prevent bias by using randomization. Accordingly, we hope to avoid systematic differences in important prognostic characteristics between study intervention arms, such that any observed differences in outcomes are more likely to be due to the interventions. Thus, well-conducted RCTs are considered the gold standard in evidence based medicine. Despite a legitimate case for RCTs, well-known challenges exist when evaluating surgical interventions [1]. Many interventions become routinely available in practice despite little or no evidence base. Surgeons and patients then prefer these interventions and may be unhappy with randomization. Masking or blinding is often difficult, and sham surgery is controversial.
One such condition is the treatment of rectal prolapse. This is a condition associated with a significant negative impact on quality of life. A range of procedures is used for this including traditional perineal approaches. More recent innovations include laparoscopic ventral mesh rectopexy [2]. Advocates of this treatment claim increased efficacy and improved quality of life. However, there remain concerns about harms, and it is not clear which groups might benefit from this compared to traditional approaches [3]. Despite the best efforts of the colorectal community, it has not been possible to conduct a definitively powered randomized controlled trial [4]. In the light of the recent mesh controversy, this may become even more difficult.
Therefore, it is clear that an alternative methodology is required to investigate the efficacy of rectal prolapse interventions. Such a methodology should allow robust comparison of interventions whilst limiting potential biases. One such design is an interrupted time series (ITS) [5,6] where outcome data are collected at multiple time points before (i.e. when listed for surgery) and after surgical intervention to establish whether the intervention results in significant effects accounting for potential underlying secular trends.
In the case of rectal prolapse, we know waiting times for surgery vary substantially throughout the National Health Service (NHS). This offers a natural experiment where each individual acts as their own control and allows comparison of outcomes following surgical intervention with concurrent outcomes of patients who wait longer and have not received the intervention at the same time point. Conceptually, this is the observational equivalent of a stepped wedge design for a randomized trial [7]. Such a study design requires consideration of what factors influence the wait for surgery. If the waiting time is due to institutional factors unrelated to the patient they can be considered 'naturally' occurring. This would make the alternative design valid. However, if patients wait longer because they are 'sicker' (chronically unhealthy) or would get 'sicker' than those waiting a short time they represent different groups and the design would be invalid.
Our aim was therefore to investigate systematic differences between those waiting a short time or a long time for operations of similar intensity and urgency with respect to patient fitness and health-related quality of life. A secondary aim was to investigate any detrimental effect of surgery waiting duration on change in health status (from listing for surgery to operation) for clinically non-urgent surgical procedures. Such data would provide an indication of whether an ITS type design would be appropriate for a rectal prolapse treatment study.

Method Study design
We conducted a prospective cohort study between January 2017 and February 2019 across seven UK NHS hospitals. The paper is reported in line with the STROBE statement [8]. Written informed consent was obtained and ethical approval was granted by the Chelsea Research Ethics Committee (ref: 16/LO/1363).

Inclusion criteria
With rectal prolapse the condition of interest, our target population was adult patients (≥ 18 years) listed for surgery of the same clinical urgency and intensity as rectal prolapse surgery, i.e. procedures graded as 'major' on the British United Providential Association (BUPA) procedure code (Table 1).

Data collection
Following consent, age, sex and comorbidities included in the Charlson comorbidity index (CCI) [9] were recorded by one of the research team (research nurse/assistant, clinicians) along with the date of addition to the surgical waiting list (baseline date). Participants also completed the EQ-5D-5L questionnaire [10] at baseline. On admission for surgery, date of admission was recorded and a second EQ-5D-5L questionnaire was completed.

Outcome measures
The primary outcome was time on the waiting list in weeks. Explanatory variables included baseline health status and comorbidity assessed using EQ-5D-5L and CCI, respectively. The secondary outcome was the change in health status whilst awaiting surgery using EQ-5D-5L.

Definitions of long and short waiters
Current NHS targets require 90% of patients to be operated within 18 weeks [11,12]. Accordingly patients were defined as long waiters if the time to surgery exceeded 18 weeks and short waiters, those who waited ≤ 18 weeks. This arbitrary classification of short and long waiters when exploring the relationship between waiting time and variables of interest can lead to misleading conclusions. To address this concern, further analysis was performed where the classification was model-based (to create latent classes) and the relationships between waiting time and variables of interest were assessed within and across waiting latent classes (see Appendix S1).

Sample size
There were no preliminary data to inform the sample size calculation. However, with a feasible maximum sample size of 212 participants, the study had > 90% power to test the a priori null hypothesis of no association versus an alternative hypothesis of an association for a correlation coefficient of at least 0.2 (between an outcome and explanatory variable if it exists).

Computation of summary scores for analysis
The CCI was derived from the sum of the clinical condition score from 19 comorbidities and age (total score range 0-37) [9]. Higher scores indicate greater morbidity. We computed the Charlson 10-year survival probability C 10 using the formula C 10 ¼ 0:983 expð0:9xCCIÞ EQ-5D-5L was used to measure health status, and a utility index was derived as previously described [13]. An additional question asks for general health status on a scale of 0-100, higher values indicating better health. Waiting time (in weeks) was calculated from the date of admission for surgery and baseline date.

Preliminary analysis
Patient characteristics and demographics were descriptively summarized (overall and stratified by waiting time group) depending on the type of variable and underlying distributions. Violin plots were used to display the distribution of baseline variables stratified by waiting time group. The Mann-Whitney U test was used to assess differences in medians of continuous baseline variables between the two groups. A non-parametric analysis of variance (Kruskal-Wallis) test was used to explore whether waiting times across hospital sites and type of operations were drawn from the same distribution. Scatter plots were used to explore any relationships between continuous baseline variables and waiting times.

Finite mixture models
Finite mixture models (FMMs) were developed to model the probability of patients belonging to each latent waiting class to estimate linear regression parameters in each class in order, drawing inference within and between classes. Model performance was assessed using the Akaike information criterion (AIC). The bimodal distribution of waiting times strongly suggested that patients were most likely to belong to two latent waiting classes (see Fig. 1), and this was supported by comparison of AIC between models with two and three latent classes. Cluster-robust standard error FMMs were selected due to observed variation in waiting time across centres. To build a multivariable FMM using a linear regression model, variables were incrementally included based on the magnitude of the AIC in the univariable case. Each change in AIC was noted and potential predictors were selected that yielded the lowest AIC. Using this model, we estimated the proportion of patients belonging to a particular class (marginal class probabilities) and mean waiting time in each class with 95% confidence intervals. Contrasts were used to assess relationships between waiting times and potential predictors across latent classes. We used FMMs to address the shortcomings of the preliminary analysis as detailed in Appendix S2 and reflected in the discussion.
IIn addition to scatter plots, multivariable linear regression models adjusted for baseline responses accounting for study site adjusted robust standard errors were used to assess if changes in the EQ-5D-5L utility index and general health score were associated with waiting times. Analyses were performed in STATA version 15.1 (College Station, TX, USA).

Patient flow
Of the 219 patients who consented, 16 were excluded (< 18 years old, n = 2; missing critical data, n = 14). Comorbidity data were available for 203 patients. Baseline EQ-5D-5L data were available for 201 patients, of whom 189 had EQ-5D-5L recorded on the day of surgery (Fig. 2).

Variation in wait times across study sites
There was variation in wait times across the population; range 0.4-68.9 weeks (Fig. 3). Figure 1 shows that the majority of patients were operated at or before 18 weeks, with a bulge of patients recorded just before the 18-week time point. Patients in site B waited much longer for surgery; median waiting time (IQR) 49.7 (45.7-50.7) weeks. The distribution of waiting times in other sites appeared comparable except for site G, contributing only one patient. The lowest median wait (in hospitals with > 10 patients) was 11.3 weeks; the highest was 49.7 weeks (Fig. 2).

Patient factors associated with wait times
Comorbidity (using the CCI) appeared similar between the two groups ( Fig. 4) [median score (IQR) 2 (0, 3) and 1 (0, 3), respectively]. The type of operation was strongly associated with waiting times (Kruskal-Wallis test; P = 0.0093) primarily driven by 'operations on small bowel only' [median (IQR) 45.1 (11.3-49.7) weeks]. There was also strong evidence to support differences in waiting times across study sites as described above (Kruskal-Wallis test; P = 0.0001). However, we noted an interaction between type of surgery and study sites as the majority of cases of 'small bowel only' operations were performed at site B which had the longest waiting times.
For predicting waiting times, the best model included the type of operation, age, sex, number of comorbidities and baseline health status as predictors. Using this model, 79.4% (95% CI 78.1%-80%) and 20.6% (95% CI 19.4%-21.9%) were classified into short and long waiter latent classes, respectively. These groups had a mean waiting time of 11.5 and 45.7 weeks, respectively.
After controlling for other factors (Table 3), there was a negligible association between age and waiting time, which was similar among short and long waiters. The waiting times were similar among male and female short waiters. However, men waited slightly longer (average of 3.8 more weeks) than women among long waiters. Waiting times were generally comparable across the type of procedures among short waiters. The association was uncertain among long waiters due to a small number of patients undergoing certain surgical procedures. There was no association between the number of comorbidities and waiting times among short and long waiters.

Baseline health and wait time
On average, long waiters rated their health status slightly higher than short waiters (see Fig. S1). Based on general health total score, short waiters also rated their general health slightly higher than long waiters (see Fig. S2). There was insufficient evidence to suggest marked differences between short and long waiters (Mann-Whitney U test, P value = 0.41 and P value = 0.40, respectively). The distributions of general health total scores and health status (utility indices) appear comparable between short and long waiters. Missing baseline EQ-5D-5L information (n = 2) Missing EQ-5D-5L information at time of surgery (n =12) After controlling for age, sex, type of operation and number of comorbidities, a 0.1 increase in health utility index was associated with a very small increase in waiting time of 0.323 and 0.087 weeks among short and long waiters, respectively (Table 3). Therefore, the association between baseline health status and waiting time was negligible and not clinically worthwhile.
Does health status deteriorate on a waiting list? Figure 5 shows the relationship between changes in health utility index and waiting time with a superimposed fitted linear regression model adjusted for baseline health state. There appeared to be a slight trend suggesting that some patients may improve very minimally in their health status while waiting for surgery. On average, a 1-week increase in waiting time was associated with a negligible improvement in health utility index of 0.001 (95% CI −0.000 to 0.003). Similar negligible trends were observed using the general health rating (see Fig. S3) meaning that we did not detect a significant change in health status associated with long wait time. That is, an increase in waiting time by a week was associated with an improvement in general health rating of only 0.20 points (95% CI 0.08-0.32). It should be noted that general health was rated on a scale of 0-100.

Discussion
This study shows that UK (NHS) patients undergoing surgical procedures of equivalent intensity and clinical urgency have highly varied wait times both within and across hospitals. Specifically, one outlying study site was the primary driver for long waiting. The study suggests that neither patient fitness nor health status explains this wait. Notably, many patients underwent surgery close to target breach dates, suggesting a system related explanation for waiting time.  These findings mean that the patient characteristics of those who wait longer are comparable to short waiters and that health status does not significantly deteriorate or improve while on a waiting list. This could be a legitimate basis for the use of a pre-and post-surgery observational study (e.g. an ITS) as an alternative to a practically and ethically challenging RCT. Such a realistic observational study could produce reliable causal Site inference of the benefits of surgical interventions by comparing patient outcomes assessed repeatedly before and after the introduction of a surgical procedure of similar intensity from the point of enlisting for surgery. This is because the health state does not seem to change significantly while on the waiting list and waiting time is unlikely to be systematically influenced by potential confounding factors. Length of wait from listing to surgery is influenced by various factors. We controlled for one by limiting participants to those who have been investigated completely and appropriately listed for non-urgent non-cancer operations of similar intensity. Current UK standards aim for 90% of these procedures to be completed within 18 weeks of referral. It is interesting to note that in our results the violin plots of sites show a 'bulge' around the 18-week time point, suggesting that all centres involved have strived to meet this target in many cases. The type of operation may also influence the waiting time. Some operations are more complex and require special skills, equipment and longer operating time. In addition, organizational pressures include preoperative assessment availability, theatre and staff availability as well as inpatient bed numbers, which vary across hospitals. Finally, the comorbidity of the patient may influence wait. For example, a patient with multiple comorbidities may take longer to achieve a level of fitness that allows routine surgery to occur safely. Or, they may be deferred due to availability of high dependency beds, or even indirectly deferred by the clinical team becoming long waiters.
One would intuitively expect to see an inverse relationship between health status and the length of wait for surgery but this was not the case in our cohort. In fact, we found a very small improvement in health status while on a waiting list, especially among long waiters. However, one should be cautious about interpretation due to small sample size. An explanation for this might be that the patient was finally admitted for surgery after a prolonged wait, and this may have caused a more positive assessment of their own health state. Either way, this finding should not influence clinical policies.
Other studies have explored the association of waiting time with health-related quality of life [14][15][16][17]. Of perhaps most relevance are data from patients waiting for elective hernia repair. In this study, the duration of time on the waiting list was not associated with a change in self-reported health [18], and patients with the poorest health tended to improve whilst waiting. Possible explanations include patients with greater depression, pain and hernia related symptoms seeking non-surgical interventions. Alternatively, perceptions of health were biased by the reassurance of pending surgery. This may explain the static or slightly positive health status trend seen in long waiters in our study. The definition of long and short waiters was in line with current political targets in the UK. This could be considered as too arbitrary. As those waiting more than 18 weeks tended to be a small proportion of the overall cohort, any future study may have to be large in order to capture an adequate number of long waiters. However, modelling suggests that, even in those who wait < 18 weeks, there was no significant change in health status. Therefore, a pre-and post-surgery observational study could be carried out with smaller numbers of patients and even those waiting a shorter time for surgery, provided outcomes were measured frequently.
One strength of the study is the use of FMMs to obviate the problem of arbitrary classification of waiting time and to handle the bimodal distribution, addressing the first aim. This allowed exploration of relationships within and between latent waiting classes. Other strengths include the prospective, multi-centre design including multiple surgical procedures of similar intensity and the use of validated outcome measures. However, there are some weaknesses. First, there were no data to inform the sample size robustly, although the feasible sample size was adequate to explore at least small associations. Second, the sample sizes within each study site and surgical procedure were relatively small with limited patients in four surgical procedures and three sites. This limits inference and subgroup exploration. Third, the population was mainly healthy and it may not be feasible to extrapolate results to less fit populations. However, results do reflect typical rectal prolapse patients. Finally, as outlined earlier in the discussion, there may be factors not measured here which have affected waiting times. For example, bed availability, last minute or on the day cancellations of surgery, and risk assessment or attitudes to risk of the clinical team.
The study has implications for researchers in the field. It strengthens the argument for a pre-and post- surgery observational study (such as an ITS) as a viable alternative design that could produce reliable causal inference to assess surgical interventions in this area where an RCT is not feasible.

Conclusion
This study shows that time on a waiting list is not strongly associated with functional status or quality of life. While we would always advocate an RCT where feasible, it strengthens the argument for using other designs when conditions are unfavourable, e.g. when equipoise is poor, which is the case in elective surgery of intermediate severity such as prolapse surgery.