Predicting time to asystole following withdrawal of life‐sustaining treatment: a systematic review

The planned withdrawal of life‐sustaining treatment is a common practice in the intensive care unit for patients where ongoing organ support is recognised to be futile. Predicting the time to asystole following withdrawal of life‐sustaining treatment is crucial for setting expectations, resource utilisation and identifying patients suitable for organ donation after circulatory death. This systematic review evaluates the literature for variables associated with, and predictive models for, time to asystole in patients managed on intensive care units. We conducted a comprehensive structured search of the MEDLINE and Embase databases. Studies evaluating patients managed on adult intensive care units undergoing withdrawal of life‐sustaining treatment with recorded time to asystole were included. Data extraction and PROBAST quality assessment were performed and a narrative summary of the literature was provided. Twenty‐three studies (7387 patients) met the inclusion criteria. Variables associated with imminent asystole (<60 min) included: deteriorating oxygenation; absence of corneal reflexes; absence of a cough reflex; blood pressure; use of vasopressors; and use of comfort medications. We identified a total of 20 unique predictive models using a wide range of variables and techniques. Many of these models also underwent secondary validation in further studies or were adapted to develop new models. This review identifies variables associated with time to asystole following withdrawal of life‐sustaining treatment and summarises existing predictive models. Although several predictive models have been developed, their generalisability and performance varied. Further research and validation are needed to improve the accuracy and widespread adoption of predictive models for patients managed in intensive care units who may be eligible to donate organs following their diagnosis of death by circulatory criteria.


Introduction
A common mode of death in the intensive care unit (ICU) involves the planned withdrawal of life-sustaining treatment (WLST) after recognising the futility of ongoing organ support.This may involve terminating invasive ventilation or vasopressors while end-of-life comfort care is administered [1].While there is some international variation, death can be confirmed after a minimum of 5 min observation following the onset of mechanical asystole (hereafter asystole) [2,3].
The prediction of time to asystole following WLST is important for setting expectations for families, ICU resource utilisation and to guide the identification of patients suitable for organ donation after circulatory death (DCD).The heterogeneity in underlying conditions, in combination with variations in levels of organ support, makes it challenging to accurately predict this time.Current practice to identify imminent death relies on clinical judgement; however, the abilities of physicians to make reliable predictions in this area are limited [4,5].
The DCD donation process is often complex, resource intensive and can be emotionally distressing for families [6], particularly if the donation is unable to proceed [7].
Successful donation is often prevented due to logistical challenges or the occurrence of prolonged time to asystole where the organs are damaged due to excessive warm ischaemic time.In the UK, 45% of unsuccessful DCD donations are attributed to this prolonged time period [8].
A variety of predictive tools and models have been developed [4,5,[9][10][11][12][13][14][15][16][17][18][19] which can provide support for decision-making in this area.Whilst some prediction models initially appear to perform well, they are often not appropriately validated and when external validation has been undertaken, their performance typically does not generalise well [20][21][22].A lack of standardisation of the variables recorded and the specifics of the withdrawal process makes the transfer and shared use of developed tools challenging.These problems underline the unmet clinical need for the development of clinical decision support tools capitalising on data, which in this context could support widespread adoption and deployment of time to asystole prediction models in DCD.The aim of this systematic review was to evaluate the literature for variables associated with, and predictive models for, time to asystole in patients managed on ICUs and who are eligible for DCD.

Methods
Following registration with PROSPERO [23] we searched MEDLINE (inception to 11 May 2022) and Embase (inception to 11 May 2022).The searches combined Medical Subject Headings (MeSH), appropriate controlled vocabulary and keywords for time, death and withdrawal (online Supporting Information Appendices S1 and S2).We explored the reference lists of all included studies and prior review studies for further inclusions.Clinical experts were consulted to evaluate the list of included studies for omissions identified through their knowledge of the field.
Conference abstracts, poster abstracts, letter responses and letters to editors were excluded.To meet inclusion criteria, we required studies to evaluate an adult population in an ICU environment who underwent WLST and had an associated time to asystole recorded.Life-sustaining treatment was defined as ventilation (invasive or noninvasive) or haemodynamic support.Measurement of time from WLST to death or asystole was also necessary for inclusion.Studies that did not evaluate potential predictive factors or models in relation to this measurement were excluded.Only English language studies were included.
Two reviewers (CN and AB) independently reviewed all titles and abstracts identified from the literature searches.
Potentially eligible studies underwent duplicate full-text review.During both processes, we resolved disagreements through a third reviewer (KP).
We extracted data from studies using customised spreadsheets, with key population characteristics such as age, ICU type and methods of withdrawal recorded.The focus of the studies was summarised as a focus on one of: variable evaluation; predictive model development; or a mixture of both.Other study design aspects such as the methods of withdrawal and outcomes assessed were also recorded.The performance metrics of any evaluated predictive factors or models were recorded.
We undertook quality assessment of predictive model development and validation using the PROBAST (Prediction model Risk Of Bias ASsessment Tool) [24] which was designed to assess the risk of bias and applicability of diagnostic and prognostic prediction model studies.Given the lack of standardisation of the withdrawal process, heterogeneous populations and variables measured at time of WLST, and variation in the outcome measure, we did not undertake data pooling or meta-analysis.Therefore, the analysis consisted of tabulation of study characteristics and performance metrics with narrative summarisation of the literature.

Results
The initial search returned a total of 2418 studies to be further screened following the removal of duplicates (Fig. 1).This produced 71 studies for full-text review with an additional paper from reference screening and from expert input.Full-text review identified 23 studies (7387 patients) for inclusion in data extraction and analysis.
Practices for WLST varied across the studies.
Mechanical ventilation of the patient's lungs was stopped at the point of WLST in all studies, with the cessation of vasoactive drugs in the majority.The specific process of withdrawal, including details surrounding the administration of comfort care medicines was not typically detailed, with only five studies [11,14,19,20,25] specifying that withdrawal of all active treatments was simultaneous.
Seven studies [13,[25][26][27][29][30][31] focused on the evaluation of variables associated with their primary outcome and did not derive or validate any predictive models.These studies typically used p value cut-off regression techniques.The variables found to be statistically associated with asystole within 60 min using multivariable analysis are detailed in Table 1 including odds and/or hazard ratios with confidence intervals.
The absence of corneal reflexes and a cough reflex were found to be associated with imminent asystole in four analyses [13,17,20,28].Additionally, an absent or extensor motor response was associated with three of these analyses [13,17,28].
Blood pressure measurements and the use of vasopressors were associated with time to asystole.However, their methods of evaluation varied between the studies: lower diastolic blood pressure [10]; lower systolic blood pressure [15]; lower mean arterial pressure [31]; higher vasopressor dose (adrenaline, noradrenaline or phenylephrine > 0.2 lg.kg -1 .min - ) [10]; vasopressor use prior to withdrawal [27]; and vasopressor use within 12 h of withdrawal [30] were associated with time to asystole.
Two analyses identified the use of comfort medications following WLST as being inversely associated, where the use of comfort medications reduced the odds of asystole within 60 min.In the study by DeVita et al. [10] the use of comfort medications during the first hour after WLST reduced the odds of imminent death during that time.In the study by Kotsopoulos et al. [20], the use of midazolam and dose of morphine administered after WLST were also associated with reduced odds of asystole within 60 min.We agree with Devita et al. [10] that this paradoxical finding warrants a prospective trial to evaluate causality.The final associated variable identified in multiple analyses was positive end-expiratory pressure [15,31].
The derivation or modification of predictive models was reported in eight studies [4,11,[13][14][15][16][17], external validation of an existing model was undertaken in three [20][21][22] and both derivation and external validation were undertaken in five [5,9,10,12,19] (Table 2).Most of the predictive model studies (15/16) included the evaluation of models for prediction of asystole within 60 min whilst some included evaluation of asystole within 120 min (9/16) or other time ranges (2/16).In total 15 original models were reported with a further five models derived by augmenting or adjusting these.
Two of the original models evaluated were developed using clinical experience and expert consensus without the reported use of statistical techniques [9,10].Of the remaining original models, two used a classification and regression tree (CART) [10,16], two used Cox regression analysis [17,25] and seven used other forms of multivariable regression analysis [5,[11][12][13][14][15]19].The final two original models used random survival forests [4] and a light gradient boosting machine [19] respectively.
At the point of derivation or modification validation procedures were varied, with five studies evaluating model performance against the same cohort used for model fitting [5,9,10,14,19], with only one of these using crossvalidation to attempt to mitigate the impact of overfitting [19].Four models were validated at the point of derivation by randomly splitting the cohort into a training and a testing set [4,12,15,16], with a further four validated using an external cohort.One model was validated using a prospective cohort and one was validated using both external and prospective cohorts.
Sixteen instances of secondary validation (validation of a distinct model by another group) were observed across six studies [5,12,[19][20][21][22] with three of these studies solely attempting to validate previously derived models without any new model derivation or modification of existing models.
The University of Wisconsin DCD tool (UWDCD) [9] is a scoring system developed using clinical experience to identify relevant clinical and demographic patient characteristics.As part of the tool, spontaneous respiratory efforts, tidal volume, negative inspiratory force and oxygen saturation are measured during a 10-min period of ventilator disconnection.The tool thresholds these measurements alongside the number of drugs used for blood pressure support, patient age and airway type.Initial validation using 43 patients showed a sensitivity of 0.87 and specificity of 0.80 for asystole within 60 min of WLST [9].
During this validation, the authors also explored the inclusion of BMI, yielding a sensitivity of 0.84 and specificity of 0.85.Two studies externally validated the tool that included BMI in larger populations (81 and 219 patients) where the performance of predicting asystole within 60 min did not generalise sufficiently well, reporting sensitivities of 0.42 and 0.45 and specificities of 0.61 and 0.49 [5,21].
The United Network for Organ Sharing tool (UNOS) [10] was developed using committee consensus and was Table 1 Variables associated with asystole within 60 min of withdrawal of life-sustaining treatment using multivariable analysis.Alternative tabulation of risk ratios for the ten most frequently identified predictors available in online Supporting Information Table S3.Values are OR (95%CI) or hazard ratio (95%CI).The original authors validated the tool using a prospective cohort to give an AUROC of 0.83.External validation across three cohorts yielded an AUROC of 0.80, 0.70 and 0.80 [19,20].
The circulatory death in patients in neurocritical state (DCD-N) tool is based on corneal reflex, cough reflex, motor response and oxygenation index [28].The authors validated the model through fitting with a prospective cohort [13], finding a sensitivity of 0.81 and 0.73.Three studies externally validated this model giving an AUROC of 0.75 [12], 0.69 [21] and 0.77 [22].In addition to externally validating this model, de Groot et al. [12] also proposed a modified model using a continuous oxygen index.This modified model gave an AUROC of 0.74 using an internal validation cohort and was externally validated across four cohorts giving an AUROC of 0.75, 0.86, 0.74 and 0.86 [19,20,22].
Wind et al. [14] used logistic regression to develop a model that used the dichotomised presence of: controlled mode ventilation; noradrenaline use; cardiovascular comorbidity; brainstem reflexes; and neurologic deficit.This achieved an AUROC of 0.73 within the derivation cohort which fell to 0.62 in a subsequent external validation cohort [20].
Brieva et al. developed models using logistic regression in a general population [15] and later, using CART model analysis in a DCD subset of this patient population [16].All of these models used ranges of positive  The C-DCD model is a nomogram that was developed using Cox regression analysis to identify 10 variables for inclusion (Table S4) [17].which accounts for more convoluted nonlinear statistical relationship between the variables and the outcome [19].The regression model which was built using LASSO achieved an AUROC of 0.80 in their external validation cohort and the model that was built using LightGBM achieved an AUC of 0.79 in 10-fold cross-validation.
We assessed the quality of the studies reporting model development or validation using PROBAST.The full results are available in online Supporting Information Table S2 with tabular results summarised in Table 3.
When considering all domains using the PROBAST assessment criteria, 10 studies were judged to be at high risk of bias, five were considered at low risk of bias and one was at an unclear risk of bias.Participant selection (domain 1) was considered to be low risk of bias in 13 studies, unclear in two studies due to inclusion/exclusion criteria, and high risk in one study due to participant selection.
Predictors (domain 2) were judged to be low risk in all 16 studies, with consistent predictive variable definition and assessment.Outcome determination (domain 3) was found to be at low risk of bias in 14 studies and unclear in the remaining two due to lack of clarity of the withdrawal process.Analysis (domain 4) was found to be the most common source of risk of bias with a high result for 10 studies and a low result for the remaining six studies.
Data imbalance, and in particular lacking a sufficiently large number of participants who had died within the specified time span (60 min or 120 min), was identified as an issue in 13 of 16 studies.Potentially inappropriate dichotomisation of continuous variables was identified in 3 of 16 studies.The frequency of missing data was not reported in 10 studies and participants with missing data were excluded in two studies.Multiple imputation was used to mitigate missing data in three studies where missing data were < 5% and the methods used to manage missing data were not reported in the final study.
Variable selection based on univariable analysis was identified in eight studies, leading to the possible exclusion of variables that may have been important after adjustment for other variables or failing to account for information overlap (i.e.redundancy).Univariable analysis to determine a subset of predictive variables is known to be flawed and there is considerable research work in the machine learning community investigating advanced principled methods towards determining parsimonious variable subsets.
Appropriate model performance measures were used in 10 studies with this becoming more common in more recent publications.As defined within PROBAST, the consideration of overfitting and optimism were also more frequent in more recent publications, although only four model development studies accounted for these effects appropriately.Four studies published between 2003 and 2012 undertook model evaluation using the full derivation cohort which inevitably leads to overfitting and reported performances that would be unlikely to be verified in an external dataset.
When considering all domains, 12 studies were high concern for applicability and four studies were low concern for applicability (Table 3).Participant selection (domain 1) was found to be high concern for applicability in 10 of 16 studies, with this typically being due to either a non-DCD eligible population or restriction to specific or limited diagnoses.Predictors (domain 2) were considered to be of low concern for applicability in 11 studies and high concern in the remaining five.High concern for applicability was suggested by the use of a period of ventilator disconnection to generate predictive variables in two studies and the requirement for brain imaging in three studies.
Outcome applicability (domain 3) was generally good, with all but one study reporting the generally accepted 60 min threshold for asystole with many also reporting additional thresholds.

Discussion
This systematic review shows the ongoing challenges of developing reliable predictive tools for time to asystole following WLST.The heterogeneity of patient populations and variations in clinical practice continue to represent core challenges that impede validation and potential deployment.
We found that studies focussing on identifying The increased focus on this external validation of existing models allows us to compare their results with the reported model performance at derivation.Three models [11,12,25] were externally validated in more than one external cohort, with predictive models typically performing slightly worse on external validation.This may suggest that the heterogeneity of patient populations is limiting generalisability or that the initial model development process was susceptible to optimism or overfitting.
However, the broad reproducibility of model performance through validation using several large populations is promising and provides evidence that supports the underlying models.
The suitability of the described models for use in clinical practice is dependent on their specific use, patients and environments.We observed minimal discussion of the importance of model performance metrics in real-world clinical situations.The trade-offs of acceptable specificity and sensitivity will be strongly impacted by the clinical environment, organ donation resources and number of potential donors evaluated.For example, in a context where there are many more potential donor candidates than the organ donation resources could manage, it may be important that the predictive model has a high specificity to maximise the effectiveness of the organ donation resources such as organ retrieval teams.
Several models showed promising performance on external validation; however, none could reliably predict imminent asystole with a level of accuracy that precludes error.Consideration for use in DCD protocols will depend on their current performance at predicting imminent asystole.The case for this could be further strengthened by organisational level validation of the model's reliability and applicability before deployment.
There are several limitations to this review which should be noted.First, the inclusion of mixed intensive care and neurointensive care populations, and the inclusion of DCD eligible and non-DCD eligible populations lead to wide heterogeneity in the study participants.This limits the potential for both meta-analysis and conclusions about specific populations as there may be wide differences between these populations.Second, the models remain heavily focused on an outcome of time to asystole within 60 min, which limits the applicability in situations where longer times to asystole would be clinically acceptable, such as DCD programmes focused on specific organs.
The ongoing progress in predictive variable identification and predictive modelling suggests that we may be getting closer to developing and validating models that could be deployed more broadly in clinical practice.
Current exploration and potential use of models appears to be fractured and often on a local scale or limited to a specific organ retrieval programme.The development of more generalisable tools will rely on further large multicentre or international studies using robust analysis procedures.It would be prudent for future efforts to develop predictive tools to consider all predictive variables identified within this review to aid in planning prospective data collection and variable selection in their analyses.The use of time series data, as partially demonstrated with measures of variability [4], may offer several advantages over typical instantaneous variables where the negative impact of spurious or outlying measurements has the potential to be minimised.Further exploration into the use of vital sign variability and full time series data for predictive modelling is warranted.
In conclusion, reliable prediction of time to asystole following WLST is an important area for clinical practice for DCD and vital for its expansion within and in addition to, existing programs.Accurate prediction would ensure donation is offered as an end-of-life decision where it is feasible, for the benefit of potential donors, their families, organ donation systems and recipients.This review provides an overview of the current progress in this area, highlighting key challenges and identifying areas of potential interest for further exploration.We emphasise the need to move towards standardisation of WLST practices through consensus, and we single out the potential of extracting clinically useful information from time series data (e.g.mining blood pressure and heart rate series), which has been hitherto largely neglected.
(continued) evaluated in three studies.It is a criteria-based scoring tool with higher scores corresponding to a higher probability of asystole within 60 min.As with the UWDCD tool, the original criteria use four measurements during a period of ventilator disconnection in addition to seven further clinical parameters.Two evaluations of the full criteria aiming to assess prediction of asystole within 60 min produced an area under the receiver operating characteristic curve (AUROC) of 0.53[21] and a positive predictive value (PPV) of 0.63[10].The evaluation by Coleman et al. used a modified UNOS tool where the period of ventilator disconnection was omitted and replaced with definitions of ventilator dependence and oxygen disruption; this was found to give a sensitivity of 0.61 and specificity of 0.84 [5].Devita et al. [10] developed a predictive tool using CART model analysis which took GCS ≥ 4, SaO 2 /F I O 2 ≥ 230 and positive inspiratory pressure ≥ 35 cm H 2 O as inputs.The performance was evaluated directly on the derivation cohort, giving a sensitivity of 0.75 and specificity of 0.73 but no external validation has been undertaken.The Hunter New England Area Composite score[5] was developed using logistic regression and included ventilatory dependence with a measure of oxygenation disruption.It achieved sensitivity of 0.56 and specificity of 0.13 on the derivation cohort.The inclusion of a systolic blood pressure threshold (<100 mmHg) led to sensitivity of 0.39 and specificity of 0.96.No external validation of this score has been undertaken.
The authors used two validation cohorts (external and prospective) to evaluate the model and report an AUROC of 0.94 and 0.99.A subsequent external validation demonstrated slightly worse performance with an AUROC of 0.88 [21].The first model to incorporate time series rather than instantaneous data into a prediction model was developed by Scales et al. [4].Here random survival forests used physician predictions alongside a series of variability features of blood pressure and heart rate time series.An internal validation cohort demonstrated an AUROC of 0.79 using this model.There has been no external validation of this model.Finally, alongside the validation of two previous models, Kotsopoulos et al. evaluated two new models, one using the least absolute shrinkage and selection operator (LASSO) to select variables in a standard linear least squares setting, and another using a light gradient boosting machine (LightGBM) variables associated with time to asystole have further strengthened the body of evidence behind the key predictors, with neurological and respiratory-related variables found repeatedly to be associated with time to asystole.In particular, the absence of a cough reflex or corneal reflexes were associated with a shorter time to asystole.It is possible that these are surrogate markers for the loss of other brainstem reflexes and may represent a population of patients who are apnoeic and will progress to cardiorespiratory arrest shortly after cessation of mandatory ventilation.We identified several tools that have been developed to predict time to asystole in <60 min although, in general, these did not perform well on external validation.The most promising tool to date was the C-DCD tool, which had an AUROC of 0.88 on external validation in a neurosurgical ICU setting.It has yet to undergo evaluation in a general ICU setting beyond the original paper.We were able to systematically evaluate the quality of included studies using the PROBAST tool, with a focus on their applicability to the review question and their risk of bias.We found that over time studies have substantially improved in both these areas.Robust examples of primary and external validation have become more prevalent within the literature, which helps move developed predictive tools closer to clinical use.Despite this, there are still no reports within the literature of use of existing or developed predictive tools clinically.The reasons for this are not explicitly discussed or elaborated on.This systematic review builds on a prior review by Munshi et al.[33] who evaluated the literature in this area up to 2014 with similar inclusion criteria.Since publication of this review, several new time-to-asystole prediction tools have been developed.There has also been a greater focus on external validation of existing prediction tools, often evaluating several at the same time.The focus on time to asystole within 60 min as the primary outcome has persisted, although many studies have expanded this scope to include evaluations of models targeting time to asystole within 30 min, 120 min or 240 min.The ongoing exploration of predicting asystole within periods >60 min is important, given the evolving landscape of DCD[34].The modelling methods used for predictions have become more diverse with the first use of random survival forests, variable selection using LASSO methods and gradient boosting machines explored.Additionally, one study incorporated time series data through the use of measures of vital sign variability as predictive variables[4] in contrast to the use of a single instantaneous set of variables as seen in all other models.The authors of the previous systematic review concluded that differences in practice could have an important impact on time to asystole as many studies did not specify the simultaneous withdrawal of ICU treatments (e.g.inotropic support and tracheal extubation) at time of WLST[34].This weakness remains in many of the more recent publications and likely reflects the variation in clinical practice in ICUs and the problems in modifying or standardising clinical protocols for observational studies.The use of more rigorous prediction tool development processes, particularly with regards to variable selection, validation measures and model calibration, has successfully reduced the bias observed in recent studies.This helps to provide support to the results as well as the performance measures they report in the external validation of existing models.

Table 2
Developed and validated predictive models and performance metrics.The full list of variables used in each model and their corresponding ranges and/or thresholds are reported in online Supporting Information TableS4.

Table 2
[28]tinued), **First published validation of a tool without initial published derivation; ***Validation of the variable combination identified by Yee et al.[28]before evaluation; # Plus expert opinion. *

Table 3
Tabular Prediction model Risk Of Bias ASessment Tool (PROBAST) results for risk of bias and applicability across domains.