Presented in part at the 23rd Anaesthesia, Critical Care and Pain Update Conference, Val d’Isere, France, January 2012.
Is ‘starting on time’ useful (or useless) as a surrogate measure for ‘surgical theatre efficiency’?*
Article first published online: 16 APR 2012
Anaesthesia © 2012 The Association of Anaesthetists of Great Britain and Ireland
Volume 67, Issue 8, pages 823–832, August 2012
How to Cite
Pandit, J. J., Abbott, T., Pandit, M., Kapila, A. and Abraham, R. (2012), Is ‘starting on time’ useful (or useless) as a surrogate measure for ‘surgical theatre efficiency’?. Anaesthesia, 67: 823–832. doi: 10.1111/j.1365-2044.2012.07160.x
You can respond to this article at http://www.anaesthesiacorrespondence.com
- Issue published online: 9 JUL 2012
- Article first published online: 16 APR 2012
- Accepted: 9 March 2012
We analysed more than 7000 theatre lists from two similar UK hospitals, to assess whether start times and finish times were correlated. We also analysed gap times (the time between patients when no anaesthesia or surgery occurs), to see whether these affected theatre efficiency. Operating list start and finish times were poorly correlated at both hospitals (r2 = 0.077 and 0.043), and cancellation rates did not increase with late starts (remaining within 2% and 10% respectively at the two hospitals). Start time did not predict finish time (receiver operating curve areas 0.517 and 0.558, respectively), and did not influence theatre efficiency (∼80–84% at either hospital). Median gap times constituted just 7% of scheduled list time and did not influence theatre efficiency below cumulative gap times of less than 15% scheduled list time. Lists with no gaps still exhibited extremely variable finish times and efficiency. We conclude that resources expended in trying to achieve prompt start times in isolation, or in reducing gap times to under ∼15% of scheduled list time, will not improve theatre productivity. Instead, the primary focus should be towards quantitative improvements in list scheduling.
Operating theatres represent a huge investment of healthcare resources (∼30% of all hospital costs ), and there is increasing focus on delivering an ‘efficient’ anaesthetic-surgical service . Operating efficiency can be described by a calculation incorporating theatre time utilisation and patient cancellations [3–5], but a single surrogate measure of efficiency might preclude the need for this calculation. One such surrogate is adherence to a defined operating list ‘start time’. Koenig et al. have demonstrated that it does not matter whether ‘start time’ refers to the ‘start of anaesthesia’ or the ‘start of surgery’ (‘knife-to-skin’), because if one of these is fixed then the other follows automatically as a probabilistic estimate . All that matters is that a clear, pre-agreed definition exists within an institution.
Prompt starts would seem important. If a list starts late, it should be expected to finish late. With late starts, the scheduled time (which has been budgeted in advance) will be wasted and with late finishes, un-budgeted overtime costs are incurred, with the attendant risk of unplanned patient cancellation. Several studies, however, have found a very high prevalence of non-compliance with pre-agreed start times. In a cohort of gynaecology lists, Collantes et al. and Walsh et al. have reported that 32–93% of lists started ‘late’ [7, 8]. The problem is not confined to the National Health Service (NHS). Vinukhondia et al. reported up to 46% of lists started late in an Indian hospital , and ‘tardiness’ of list starts is also an important issue for hospitals in the USA .
Consequently, start times are monitored routinely within many NHS Trusts, and are used both as local ‘key performance indicators’ to guide planning, and, occasionally, to pressurise individuals or teams to change behaviour, attitudes or policies [11, 12]. The Foundation Trust Network has promoted start times as a key target  and the Royal College of Anaesthetists has suggested that fewer than 10% of lists should start more than 10 min late . However, whilst the published evidence has found that late starts are frequent, it does not indicate that late starts affect theatre efficiency. Furthermore, there is no evidence to substantiate the Royal College’s suggestion that 10 min should be a relevant time boundary when defining a ‘late start’. Indeed, Macario has suggested (but without citing data to support the claim) that late starts of up to 45 min remain consistent with efficient performance [15, 16].
The main aim of this study was to test the hypothesis that start time was an important determinant of operating list efficiency. Specifically, we predicted that lists that started late would end late and be less efficient.
As late starts represent a form of wasted time on a list, we also analysed ‘gap’ (or ‘turnover’) times, to determine the time on a list when no anaesthesia or surgery occurs. Although prolonged gap times are common , eliminating gap times does not appear to improve efficiency , possibly because most gap times (in the NHS) constitute < 15% of the total scheduled list time . Intuitively however, lists accumulating longer gap times should exhibit lower efficiency.
We examined data from two hospitals in different regions of England. Hospital A has ∼5000 staff, 650 inpatient beds, 22 operating theatres and an annual budget of ∼£290 million (€352 million; $459 million). Hospital B is smaller with ∼3000 staff, 500 inpatients beds, 12 theatres and an annual budget of ∼£135 million (€164 million; $214 million). Neither is assigned as university or tertiary referral centre.
Data were collected from automated theatre data collection systems. Hospital A uses Bluespier Theatre Manager (Bluespier International, Droitwich, UK) and Hospital B uses TheatreMan (Trisoft, Chesterfield, UK), and both had done so for at least two years before the start of this study. The accuracy of timing data had previously been verified by the respective Trusts in comparison with written data entry in theatre log books, by intermittent verification by respective theatre managers, and by contemporaneous comparison when the authors were providing direct clinical care. Outlier and cancellation data were cross-checked against secondary sources, for example, using admissions data.
We collected an equivalent amount of data for theatre timing and cancellation rates at each hospital, over six months in 2010 at Hospital A and 12 months (2010) at Hospital B. We included all elective, general surgical lists, but excluded all emergency work, additional service (‘waiting’) lists, transplant, cardiac and neurosurgery. At the time of data collection, neither hospital was subject to any special initiative related to theatre management that might have influenced the mode of operating or team behaviour (for example, ‘lean’ or ‘six sigma’). No individual patient identifiable data were recorded. Each operating list at Hospital A is scheduled for 4 h (8 h for a full-day list); each operating list at Hospital B is scheduled for 3.5 h (7 h for a full-day list).
We recorded the following data: scheduled start time; actual start time (defined as the start of anaesthesia); scheduled list duration; actual list duration (defined as the list finish time when the last patient on the list left theatre minus the start time of anaesthesia for the first patient on the list); under-run and over-run times (defined as the difference in minutes between the actual and scheduled finish times); the gap time (Fig. 1); the number of cases scheduled on the list; and the number of cases completed (and hence the cancellation rate).
where the ‘fraction of scheduled time utilised’ means that for a list scheduled for 10 h, that finishes in 5 h, this term = 0.50 and the ‘fraction of scheduled time over-running’ = 0. The ‘fraction of scheduled time over-running’ means that for a list scheduled for 10 h that over-runs by 2 h, this quantity = 0.20, and the fraction of scheduled time utilised = 1. Thus, the first two terms operate mutually exclusively; a single list cannot be at once both under- or over-utilised. The ‘fraction of scheduled operations completed’ means that if three out of four of the patients booked on to the list have their operations (i.e. one patient is cancelled), this quantity = 0.75.
Our hypothesis did not require any comparisons to be made between the two hospitals. Instead, start and finish times were correlated using Pearson’s correlation coefficient, and for any curve fitting, we used linear or non-linear least squares regression (SPSS Inc, Chicago, IL, USA). Where comparisons of interest were to be made, we considered a p value of < 0.05 as statistically significant.
Table 1 shows the summary data for the two hospitals. A similar number of lists were analysed at each hospital, but there was more theatre time scheduled at Hospital A because each list was scheduled for a 4-h rather than 3.5-h duration. Compared with Hospital B, Hospital A had a lower cancellation rate, more late finishes and fewer early finishes, contributing to a median utilisation of 100%.
|No. of lists||Theatre time scheduled; min||Theatre time utilised; min||Lists finishing early||Lists finishing late||Utilisation||Patients booked; n||Cases cancelled; n||Cancellation rate||Efficiency|
|Hospital A||3512||945 525||941 904||1.3%||31.6%||100 (83–115 [0–381])%||14 441||165||1.1%||84 (71–93 [0–100])%|
|Hospital B||3632||874 860||759 719||38.9%||4.5%||86 (70–103 [0–239])%||12 089||1126||9.3%||80 (59–89 [0–100])%|
The calculated efficiencies of the two hospitals were similar. Hospital A cancelled fewer patients, but over-runs were frequent. Over-runs at Hospital B, in contrast, were infrequent, but at the expense of a higher cancellation rate. These data illustrate the conflicting effects of patient cancellation and theatre utilisation on theatre efficiency, and the different choices made by managers and/or teams at different hospitals.
Figure 2 summarises these data by way of efficiency plots. Superimposition of data points masks the full extent of over-running in Hospital A, but it is clear that cancellations are rare, while it is also clear that under-running and cancellation are more prevalent in Hospital B.
Figure 3 shows the Gaussian distribution of utilisations in the respective hospitals, with Hospital A exhibiting a right-shifted curve reflecting its higher proportion of list over-runs. It also shows the skewed distribution for efficiency, with the majority of lists in both hospitals showing efficiencies of > 80%.
Start and finish times
Figure 4 shows the distribution of list start and finish times at each institution. Absolute start times at Hospital A appear delayed compared with those at B. For both hospitals, finish time distribution is much wider than that for start times, suggesting a lack of correlation between start time and finish time, which is confirmed in Fig. 5.
It is possible to test the extent to which the start time is predictive of the finish time by moving the y-axis in Fig. 5 from left to right, assigning the data points into quadrants that yield differing numbers of true and false positives and negatives for the prediction. This can be used to construct receiver operating curves (ROCs, Fig. 6)  for 10-min movements of the y-axis. Using this method, start time data from both hospitals are poorly predictive of finish times (area under ROC curve 0.517 and 0.558 for Hospitals A and B, respectively).
Consequently, start time did not influence efficiency (plots not shown; best-fit regression polynomials for efficiency at Hospital A = −0.006x2 + 0.13.start-time + 80 (r2 = 0.014), and for B = −0.005.x2 + 0.3x + 76 (r2 = 0.3)). Therefore, efficiency does not decline markedly unless a late start accounts for more than 20% of scheduled list time (equivalent to more than 48 min for a half-day operating list), a percentage similar to that previously reported [15, 16]. Our modelling suggests that very early starts are also detrimental to overall efficiency.
Figure 7 shows the distribution of gap times, which were a median (IQR [range]) of 7% (Hospital A (3–13 [0–71])%, Hospital B (1–14 [0–100])%) of scheduled list time. Our data do not support the suggestion that gap times are generally long, or reduce efficiency (Fig. 8a,c) or result in later finishes (Fig. 8b,d) [17, 18]. Indeed, even when gap time is eliminated, there remains a wide variation in efficiency and finish times (Fig. 9).
Our data show that start times do not predict either when an operating list will finish, or whether it will run efficiently (that is, complete all the scheduled cases without under- or over-run ). This is contrary to our stated hypothesis, and at odds with performance indicators suggested by several authoritative bodies. Our evidence does not support the Royal College of Anaesthetists’ recommended upper limit for late starts of 10 min , but is more consistent with Macario’s suggestion that late starts of up to ∼45 min do not preclude efficient performance [15, 16]. This is not to suggest that a fixed start time is necessarily wrong. Professionals are obliged to arrive at work in a timely manner, and an agreed start time is necessary for the interpretation of theatre performance data. However, as an indicator of theatre productivity, the ‘start time’ would appear to be useless.
If a late-departing train will clearly arrive late, reducing service efficiency, then why does this not appear to be the case for late-starting operating lists? We did not examine this question directly, but there may be several logical reasons why late starts may not correlate with finish times. If a list is under-booked, then it does not matter that there is a very late start; the list will always finish within time. Similarly, if a list is over-booked by 2 h of surgery, it does not matter that it starts even an hour late, as it will always finish late. If a list is over-booked but starts early, then it will finish promptly or ‘less late’. In each of these scenarios, there is no correlation between start and finish time. Trains, in comparison with operating theatres, lack the opportunities to reduce gaps between a series of steps of process or for increasing compensatory speed of movement.
Furthermore (and as has been previously suggested ), our results indicate that the sum total of the gap times rarely exceeds ∼15% of the list time. However, even when late starts and gaps were eliminated, list performance did not improve (Fig. 9), suggesting that some other factor(s) were more important in influencing theatre efficiency and finish time. The poor correlation between gap times and list efficiency has been previously reported . Sokolovic et al. found that adding an extra anaesthetist and anaesthetic nurse successfully reduced mid-list gap times by 20%, but increased overtime by 118% because of later list finishes , such that reducing gap times did not impact upon the underlying problem of over-booking and late finishes. Proper focus, effort and investment should therefore be aimed at these other factors, rather than simply upon start or gap times.
We have reported previously that accurate list scheduling is likely to be an important factor in theatre management, suggesting that start or gap times will only influence the efficiency of a list if the list is properly scheduled in the first place – that is, the list is calculated to finish on time using operator-specific and organisational modelling tools . When a list is scheduled poorly (i.e. under- or over-booked of cases), late starts or long gap times will have little effect on reducing efficiency further. Although we did not examine specifically how accurately lists were scheduled, neither Hospital A nor B uses an objective scheduling algorithm, instead planning lists in an ad-hoc manner. It is self-evident that lists are poorly scheduled (i.e. either under- or over-booked –Table 1) and therefore, it follows that start times might predictably be expected to be independent of finish times . We are not aware of any hospital in the UK that has published the results of theatre timings using objective scheduling programs (such as GE Healthcare’s Opera software, http://www.gehealthcare.com/euen/iis/pdf/op-4-br.pdf, or the ORMIS operating theatre system, http://www.isoftplc.com/corporate/media_files/ORMIS.pdf.). Also, no hospital has yet adopted the scheduling algorithm of Pandit and Tavare .
We analysed > 7000 surgical lists, the largest set of theatre time data published from the UK to date. Large datasets more commonly emanate from healthcare systems such as those in the USA , and it is only comparatively recently that NHS hospitals have had access to user-friendly interfaces for downloading data direct to an electronic database [24, 25]. Nevertheless, further subgroup analysis of the data (for example, by team or individual) remains difficult, and it may be the case that there is consistent variation in performance by team or specialty at both hospitals. This is not a limitation in this study per se because our primary analysis involved the institutional correlation of start times with finish times, but great caution would need to be exercised if we were to analyse performance by subgroup in the absence of any correlation .
In such a large dataset, it is inevitable that recording errors will have been made by staff inputting the time data, but we propose that such errors are rare and inconsequential. We validated data entry through contemporaneous observation, and outlier data through confirmation with the theatre manager. For example, lists with zero efficiency were, indeed, those which were cancelled in their entirety. Lists that started very early were found to be over-booked afternoon sessions starting mid-morning, using time made available by an early finishing morning list. Lists that over-ran by more than 100% were found to be very over-booked half-day lists that proceeded into the afternoon and/or evening. The recording of a zero gap time may have meant staff interpreted very small gaps of just a few minutes as equating to zero (i.e. rounding errors in original data input), but we found these were also lists with two or more anaesthetists who could ‘parallel process’ such that the next patient was anaesthetised and ready for theatre while the previous patient was being awoken.
We did not analyse the causes of late starts or mid-list gaps. Other authors have identified a plethora of organisational and personnel issues, including staffing shortages, portering delays, non-availability of equipment and congestion in the recovery area  and these were undoubtedly contributory at Hospitals A and B. However, the main purpose of this investigation was to assess the relationship between finish time and start time, regardless of the cause of the latter.
We suggest that the efficiency graphs shown in Fig. 2 are indicative of the different approaches (ethos) to operating theatre productivity practised at each hospital, and possibly throughout the NHS. We consider that Hospital A runs a pressured, full-capacity system: lists are over-booked for the time available and commonly over-run, but cancellations are rare (in part, because there is a robust, anaesthesia-led pre-assessment system that helps forward planning). In contrast, we consider that Hospital B runs a less pressured, lower capacity system: lists start more promptly than at Hospital A, and rarely over-run, possibly at the expense of a higher cancellation rate.
Both hospitals perform sub-optimally in terms of theatre productivity, but appear to do so in different ways. Current NHS financial constraints require each hospital to make a ∼10% saving in theatres (http://www.guardian.co.uk/news/datablog/2011/feb/23/nhs-cuts-list#data) and paradoxically perhaps, this might be more easily achieved at Hospital B, by modestly reducing early finishes and cancellations (which might be very easy through, for example, better pre-assessment) [28, 29]. In comparison, the opportunities for cost savings are limited at Hospital A, which is already fully utilising theatre time (albeit artificially through regular over-runs) and has a cancellation rate as low as it is likely to be able to achieve. Indeed, the concern should be that personnel begin to claim overtime payments, which they are entitled to do, or withdraw their unpaid labour, either or both of which will increase the rate of cancellation and theatre costs at Hospital A.
In summary, just as ‘theatre utilisation’ is a poor indicator of ‘theatre efficiency’ (as it is easy to achieve full utilisation by over-running ), our data show that ‘start time’ is equally valueless. Furthermore, because gap times account for less than 15% of scheduled list time, major investment to reduce these times further is unlikely to result in improved theatre performance. We hypothesise that sustained performance improvements are only possible through better list scheduling (such as that described by Pandit and Tavare ) and the adoption of balanced measures of theatre efficiency (such as those described by Eqn 1 ).
No external funding or competing interests declared. JJP is an Editor of Anaesthesia and therefore this manuscript has undergone additional external review.
- 2House of Commons Public Accounts Committee. The use of operating theatres in Northern Ireland. Health and Personal Social Services, Seventh Report of Session 2005–6, London, HMSO, 2006.
- 7Theatre sending: how long does it take and what is the cost of late starts? Gynecological Surgery 2010; 7: 307–10., , , , .
- 11Theatre utilisation. Report to Oxford Radcliffe Hospitals NHS Trust Board, September 2008. At: http://www.oxfordradcliffe.nhs.uk/aboutus/trustboard/tbdocs08/september/theatreutilisation080918.pdf (accessed 15/09/2011)..
- 12Anaesthesia News. 2011; 286: 3..
- 13Foundation Trust Network. FTN Benchmarking: Driving Performance Improvement in Operating Theatres. London: NHS Confederation, 2010.
- 14Section 13.4: Efficient Use of Planned Operating Lists. In Colvin JR (ed.). Raising the Standard: a Compendium of Audit Recipes for Continuous Quality Improvement in Anaesthesia, 2nd edn. London: RCOA, 2006., .