A clinical prediction model to assess the risk of operative delivery

Authors

Errata

This article is corrected by:

  1. Errata: Corrigendum Volume 124, Issue 8, 1290, Article first published online: 20 June 2017

E. Schuit, Department of Epidemiology, Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Stratenum 6.131, PO Box 85500, 3508 GA, Utrecht, the Netherlands. Email e.schuit@umcutrecht.nl

Abstract

Please cite this paper as: Schuit E, Kwee A, Westerhuis M, Van Dessel H, Graziosi G, Van Lith J, Nijhuis J, Oei S, Oosterbaan H, Schuitemaker N, Wouters M, Visser G, Mol B, Moons K, Groenwold R. A clinical prediction model to assess the risk of operative delivery. BJOG 2012;119:915–923.

Objective  To predict instrumental vaginal delivery or caesarean section for suspected fetal distress or failure to progress.

Design  Secondary analysis of a randomised trial.

Setting  Three academic and six non-academic teaching hospitals in the Netherlands.

Population  5667 labouring women with a singleton term pregnancy in cephalic presentation.

Methods  We developed multinomial prediction models to assess the risk of operative delivery using both antepartum (model 1) and antepartum plus intrapartum characteristics (model 2). The models were validated by bootstrapping techniques and adjusted for overfitting. Predictive performance was assessed by calibration and discrimination (area under the receiver operating characteristic), and easy-to-use nomograms were developed.

Main outcome measures  Incidence of instrumental vaginal delivery or caesarean section for fetal distress or failure to progress with respect to a spontaneous vaginal delivery (reference).

Results  375 (6.6%) and 212 (3.6%) women had an instrumental vaginal delivery or caesarean section due to fetal distress, and 433 (7.6%) and 571 (10.1%) due to failure to progress, respectively. Predictors were age, parity, previous caesarean section, diabetes, gestational age, gender, estimated birthweight (model 1) and induction of labour, oxytocin augmentation, intrapartum fever, prolonged rupture of membranes, meconium stained amniotic fluid, epidural anaesthesia, and use of ST-analysis (model 2). Both models showed excellent calibration and the receiver operating characteristics areas were 0.70–0.78 and 0.73–0.81, respectively.

Conclusion  In Dutch women with a singleton term pregnancy in cephalic presentation, antepartum and intrapartum characteristics can assist in the prediction of the need for an instrumental vaginal delivery or caesarean section for fetal distress or failure to progress.

Introduction

In the 1970s caesarean delivery rates began to rise in most Western countries1 and they continue to rise in most of these countries.2 In 2008 the USA caesarean section rate increased for the 12th consecutive year to a total of 32.3%.2 On the other hand instrumental vaginal delivery rates have remained stable over the years.2,3 Most caesarean sections and instrumental vaginal deliveries are performed because of suspected fetal distress or failure to progress. Furthermore, in absolute numbers, by far most of the instrumental vaginal and caesarean deliveries are performed in women with a term pregnancy with a fetus in cephalic presentation.4 Despite these numbers, it remains difficult to predict by which mode of delivery these women will actually deliver.

The mode of delivery often depends on several maternal characteristics, e.g. maternal age,5 parity,6 body mass index,7 maternal height,8 gestational age at delivery,9 fetal head position,10 amniotic fluid volume,11,12, ultrasonic estimated fetal weight12 and cervical length,13 but also on problems that might arise during labour (e.g. meconium-stained amniotic fluid). Several predictive models using these characteristics have been developed to predict the occurrence of caesarean section.13–18 However, as fetal distress or failure to progress followed by either a caesarean section or instrumental vaginal delivery have different consequences for the neonatal and maternal outcomes (including future pregnancies), it would be helpful to predict either of these interventions and their indications: instrumental vaginal delivery due to suspected fetal distress (IVD-FD), caesarean section due to suspected fetal distress (CS-FD), instrumental vaginal delivery due to failure to progress (IVD-FTP) and caesarean section due to failure to progress (CS-FTP).

Such a rule—based on easily and readily available characteristics—could be helpful to clinicians because it would allow for timely prognostication, which may lead to more effective decision making during labour. It could be an alert (e.g. to the fact that the child is likely to be suffering from fetal distress during labour), it could be used in the decision for primary caesarean section, or it could aid in organisational aspects of the delivery (e.g. availability of doctors who can perform an instrumental delivery and availability of operating theatre and personnel). Finally, it allows for more individualised counselling of the pregnant woman.

Therefore, the aim of the present study was to identify which factors that can be obtained either before labour or during early labour, independently contribute to predicting the risk of instrumental vaginal delivery and caesarean section with a suspected fetal distress or failure to progress indication. For this purpose we used data from a large multicentre trial in which women in labour with a high-risk vertex singleton pregnancy beyond 36 weeks of gestation were studied.19

Methods

Setting

We used data from a recently published randomised clinical trial conducted in the Netherlands.19 In this trial, labouring women with high-risk vertex singleton pregnancies beyond 36 weeks of gestation were randomly allocated to either intrapartum monitoring by cardiotocography plus ST-analysis of the fetal electrocardiogram or cardiotocography only. The performance of fetal blood sampling was guided by a strict protocol. Both the design and the main results of the study are presented elsewhere.19

In the Netherlands, low-risk pregnant women are monitored by midwives or general practitioners at home or in hospital (primary care), whereas high-risk pregnant women are monitored by gynaecologists in hospital (secondary care). High-risk pregnancies included pregnancies that were complicated by hypertensive disorders, pre-existing maternal disease, ruptured membranes for more than 24 hours, complicated obstetric history, intrauterine growth restriction, a post-date gestational age (≥ 42weeks of gestation), need for pain relief, failure to progress, meconium-stained amniotic fluid or non-reassuring fetal heart rate at intermittent auscultation by a midwife.19

Outcome

Based on the combination of the intervention (IVD or CS) and the indication for the intervention (FD or FTP) women were assigned to one of the five distinctive outcome categories: spontaneous vaginal delivery (reference category); instrumental vaginal delivery due to suspected fetal distress (IVD-FD); caesarean section due to suspected fetal distress (CS-FD); instrumental vaginal delivery due to failure to progress (IVD-FTP); or caesarean section due to failure to progress (CS-FTP). Hence, the outcome of this study was multinomial or polytomous (i.e. more than two unordered outcome categories). Instrumental vaginal delivery was defined as either vacuum or forceps extraction or both. Suspected fetal distress was defined as the baby having a preterminal or rapidly deteriorating abnormal cardiotocographic pattern, a pH below 7.20 obtained by fetal blood sampling or a significant ST-event. Failure to progress in the first stage was defined as an arrest of labour of at least 2 hours with adequate contractions. Failure to progress in the second stage was defined as having a period of active pushing of more than 60 minutes. As such, women in the second stage of labour with an indication of FD as well as FTP were qualified as having had an intervention with an indication FTP if the duration of active pushing exceeded 60 minutes or FD if it did not.

Candidate predictors

Based on literature and clinical reasoning we selected candidate predictors for the above defined outcome categories.13,20 Candidate predictors were categorised into antepartum and intrapartum variables. The antepartum variables included maternal age, parity, gestational age, maternal diabetes mellitus, previous caesarean delivery, fetal gender, maternal hypertensive disorder, suspected intrauterine growth restriction and antepartum estimated fetal weight. As the latter was not registered, the actual birthweight—in 100-g increments—was used as a potential predictor in the development of the models. Maternal diabetes mellitus was defined as both pregestational type 1 and 2 as well as gestational diabetes mellitus. An antepartum prediction model (model 1) was developed using this first set of variables.

The second set of candidate predictors contained variables obtained early during labour, i.e. induced onset of labour, oxytocin augmentation, intrapartum fever (≥37.8°C), rupture of membranes >24 hours, epidural anaesthesia and meconium-stained amniotic fluid. These intrapartum predictors were added to model 1, to determine their added predictive value (model 2).

The allocated intervention of the original trial was taken into account by inclusion of this intervention variable in the multivariable analysis of models 1 and 2. Maternal and gestational age and birthweight were analysed as continuous variables. Restricted cubic spline analyses were used to assess linearity of their association with the outcome.21 Furthermore, several interactions were investigated, i.e. epidural anaesthesia and induced onset of labour, epidural anaesthesia and oxytocin augmentation, and epidural anaesthesia and intrapartum fever.

Data analysis

Univariable associations between candidate predictors and the different outcome categories were estimated with multinomial logistic regression analysis. Multinomial logistic regression allows for simultaneous estimation of the probability of the different outcomes (IVD-FD, CS-FD, IVD-FTP, CS-FTP, and spontaneous delivery [the reference category]).22–25 Essentially, the multinomial logistic regression model includes several logistic regression models simultaneously, to estimate the associations between the predictors and each of the outcomes compared with the reference category. Hence, estimated regression coefficients for the predictors may differ per outcome.22–25

Selection based on univariable statistics might result in unstable prediction models, so we chose not to perform any preselection and to include all candidate predictors in the multivariable analyses.21,26 In the model including antepartum predictors only (model 1), as well as the model including both antepartum and intrapartum predictors (model 2), the final predictors were identified by a backward stepwise selection in the multinomial logistic regression model using Akaike’s Information Criterion.27

Various women had missing values for some of the potential predictors. These values were to some extent selectively missing (as published in the main trial report; Appendix 3 of Westerhuis et al.,19 available online at http://links.lww.com/AOG/A178). Hence, as widely acknowledged, a complete case analysis would yield biased results.28–30 We therefore used multiple imputation (ten imputed datasets) following the original trial analyses.19 As imputed data sets differ from each other, predictors were selected in each imputation set separately. For inclusion in the final prediction models, we used the majority method, i.e. predictors were included if selected in at least five out of ten imputed data sets.31 The regression coefficients and standard errors of these final predictors were combined from the ten data sets using Rubin’s rules to come to the two final prediction models.32

The models were (internally) validated using bootstrapping techniques. One hundred bootstrap samples of equal size to the original data (n = 5667) were drawn from the original dataset with replacement, allowing for multiple sampling of the same individual. Within each bootstrap sample the entire modelling process described above was repeated. This yielded a shrinkage factor for the regression coefficients to adjust these regression coefficients and so the final model for optimism and overfitting.21 The area under the Receiver Operating Characteristic curve (AUC) was studied to assess the ability of the two models to discriminate between women undergoing one of the interventions versus those undergoing a spontaneous vaginal delivery. Hence, we calculated four AUCs, each time relating one outcome versus the reference category. The predicted probabilities were compared with the observed frequencies of the different outcome categories using calibration plots to assess the calibration of the two models.21,33

Finally, to improve clinical application, nomograms were developed to easily calculate the probability of VD-FD, CS-FD, IVD-FTP, CS-FTP, and spontaneous delivery.

All analyses were performed in R version 2.10.0 (The R Foundation for Statistical Computing, 2009).

Results

Between January 2006 and July 2008, 5667 women met the inclusion criteria of the randomised clinical trial.19 In these, IVD-FD occurred in 375 (6.6%) women, CS-FD in 212 (3.6%), IVD-FTP in 433 (7.6%), and CS-FTP in 571 (10.1%) and spontaneous delivery occurred in 4077 (71.9%) women. Characteristics of these women are presented in the second column of Table 1.

Table 1.   Characteristics of the study population and the univariable associations between potential predictors and the mode of delivery and its indications
CharacteristicOverallDeliveryOdds ratio (95% CI)
Spont.IVD-FDCS-FDIVD-FTPCS-FTPIVD-FD vs spont.CS-FD vs spont.IVD-FTP vs spont.CS-FTP vs spont.
= 5667= 4077= 375
(7)
= 212
(4)
= 433
(8)
= 571
(10)
  1. Spont., spontaneous delivery; data are presented as mean ± standard deviation or n (%).

  2. *Significant.

  3. **When only multiparae were selected this resulted in an OR (95% CI) of: 5.91 (3.73–9.37)* for IVD-FD; 2.82 (1.72–4.64)* for CD-FD; 10.0 (5.44–18.4)* for IVF-FTP; and 8.28 (5.52–12.4)* for CS-FTP.

Antepartum
Maternal age, years32.0 ± 4.831.9 ± 4.831.8 ± 4.732.6 ± 4.832.1 ± 4.232.5 ± 5.01.00 (0.97–1.02)1.03 (1.00–1.06)*1.01 (0.99–1.03)1.02 (1.00–1.04)
Gestational age, weeks40.2 ± 1.440.1 ± 1.540.4 ± 1.440.4 ± 1.440.4 ± 1.440.6 ± 1.31.18 (1.09–1.28)*1.20 (1.08–1.34)*1.21 (1.12–1.30)*1.35 (1.26–1.45)*
Nulliparous3236 (57)1990 (49)291 (78)141 (66)374 (87)440 (77)3.64 (2.84–4.68)*2.08 (1.53–2.83)*6.73 (5.07–8.94)*3.53 (2.87–4.35)*
Previous caesarean delivery**716 (13)491 (12)54 (14)33 (16)44 (10)94 (16)1.23 (0.91–1.67)1.35 (0.90–2.03)0.83 (0.60–1.15)1.44 (1.12–1.84)*
Maternal diabetes mellitus169 (3)120 (3)8 (2)9 (4)7 (2)25 (4)0.72 (0.35–1.48)1.51 (0.75–3.04)0.54 (0.25–1.17)1.49 (0.95–2.33)
Pre-eclampsia or pregnancy-induced hypertension678 (12)483 (12)49 (13)35 (17)46 (11)65 (11)1.12 (0.82–1.54)1.49 (1.02–2.17)*0.89 (0.64–1.22)0.96 (0.73–1.26)
Intrauterine growth restriction133 (2)106 (3)11 (3)15 (7)1 (0)0 (0)1.13 (0.60–2.13)2.85 (1.62–5.01)*0.09 (0.01–0.62)*NA
Neonatal female gender2668 (47)1977 (48)175 (47)84 (40)176 (41)256 (45)0.93 (0.75–1.15)0.70 (0.52–0.94)*0.73 (0.60–0.89)*0.86 (0.72–1.03
Birthweight, 100 g increments35.4 ± 5.235.1 ± 5.134.2 ± 5.234.2 ± 5.836.8 ± 4.538.0 ± 5.10.96 (0.94–0.98)*0.96 (0.94–0.99)*1.06 (1.04–1.09)*1.11 (1.09–1.13)*
Intrapartum
Induced onset of labour2341 (41)1714 (42)126 (34)105 (49)147 (34)249 (44)0.70 (0.56–0.87)*1.35 (1.02–1.79)*0.71 (0.58–0.88)*1.07 (0.89–1.27)
Oxytocin augmentation2044 (36)1294 (32)163 (43)78 (37)234 (54)275 (48)1.65 (1.33–2.05)*1.26 (0.94–1.69)2.54 (2.08–3.10)*2.00 (1.67–2.39)*
Intrapartum fever ≥37.8°C470 (8)236 (6)45 (12)18 (8)78 (18)93 (16)2.22 (1.58–3.12)*1.47 (0.85–2.55)3.58 (2.71–4.73)*3.18 (2.45–4.13)*
Rupture of membranes > 24 hours692 (12)502 (12)38 (10)15 (7)72 (17)65 (11)0.80 (0.57–1.14)0.53 (0.29–0.95)*1.42 (1.09–1.86)*0.92 (0.70–1.21)
Meconium-stained amniotic fluid1471 (26)990 (24)114 (30)67 (32)115 (27)185 (32)1.36 (1.08–1.72)*1.44 (1.06–1.95)*1.13 (0.90–1.41)1.49 (1.23–1.81)*
Epidural anaesthesia2389 (42)1438 (35)190 (51)122 (57)243 (56)397 (69)1.89 (1.52–2.33)*2.46 (1.82–3.33)*2.36 (1.93–2.88)*4.18 (3.44–5.07)*
Use of ST-analysis2827 (50)2052 (50)185 (49)114 (54)193 (45)283 (50)0.96 (0.78–1.19)1.15 (0.86–1.52)0.80 (0.65–0.97)*0.97 (0.81–1.16)

Antepartum predictors related to any of the four outcomes in univariable analysis were maternal age, gestational age, nulliparity, previous caesarean delivery, pre-eclampsia or pregnancy-induced hypertension, intrauterine growth restriction, neonatal female gender and birthweight (Table 1). Intrapartum predictors included induced onset of labour, oxytocin augmentation, intrapartum fever, rupture of membranes >24 hours, meconium-stained amniotic fluid, epidural anaesthesia, and the performance of ST-analysis. None of the women with CS-FTP had intrauterine growth restriction. As a result, the effect of this variable on the outcome could not be estimated reliably.34 Therefore, although intrauterine growth restriction was shown to be related to IVD-FTP and CS-FD, the variable was not considered in the multivariable analyses.

In model 1, seven antepartum variables were identified to predict one of the four outcomes: maternal age, nulliparity, previous caesarean delivery, maternal diabetes mellitus, gestational age, neonatal female gender and birthweight (Table 2). The model’s AUC for IVD-FD was 0.72 (95% CI 0.69–74), CS-FD was 0.70 (95% CI 0.66–0.73), IVD-FTP was 0.78 (95% CI 0.76–0.80) and CS-FTP was 0.78 (95% CI 0.76–0.80).

Table 2.   Multivariable associations for model 1 (antepartum characteristics only)
CharacteristicIVD-FD vs spont.CS-FD vs spont.IVD-FTP vs spont.CS-FTP vs spont.
Beta*OR (95% CI)*Beta*OR (95% CI)*Beta*OR (95% CI)*Beta*OR (95% CI)*
  1. Spont., spontaneous delivery.

  2. *Shrunken; shrinkage factor 0.99–1.02.

Intercept−13.1 −15.6 −11.1 −15.4 
Antepartum
Maternal age, years0.0291.03 (1.01–1.05)0.0521.05 (1.02–1.09)0.0541.06 (1.03–1.08)0.0561.06 (1.04–1.08)
Gestational age, weeks0.261.29 (1.18–1.41)0.321.38 (1.22–1.56)0.0381.04 (0.95–1.13)0.131.14 (1.05–1.24)
Nulliparous2.057.79 (5.26–11.5)1.133.09 (2.09–4.55)3.3929.7 (17.2–51.1)2.6514.1 (9.78–20.3)
Previous caesarean delivery1.775.87 (3.70–9.32)1.062.88 (1.74–4.76)2.3910.9 (5.92–20.1)2.239.34 (6.17–1.41)
Neonatal female gender−0.190.83 (0.67–1.03)−0.500.61 (0.45–0.83)−0.250.78 (0.63–0.96)–0.0130.99 (0.81–1.20)
Birthweight, 100-g increments−0.0590.94 (0.92–0.97)−0.0790.92 (0.89–0.96)0.0831.09 (1.06–1.11)0.121.12 (1.10–1.15)
Maternal diabetes mellitus0.321.37 (0.65–2.91)0.992.69 (1.29–5.60)−0.240.79 (0.35–1.76)0.872.38 (1.44–3.95)

Addition of intrapartum characteristics including interaction terms to model 1, yielded that induced onset of labour, oxytocin augmentation, intrapartum fever, rupture of membranes, meconium-stained amniotic fluid, epidural anaesthesia, use of ST-analysis and an interaction of epidural anaesthesia and oxytocin augmentation were additional predictors of one of the four outcomes (Table 3). The model’s AUC was slightly higher than that of model 1: for IVD-FD 0.73 (95% CI 0.70–0.75), for CS-FD 0.73 (95% CI 0.70–0.76), for IVD-FTP 0.80 (95% CI 0.78–0.82) and for CS-FTP 0.81 (95% CI 0.8–0.83)). The higher discriminative ability of model 2 is reflected by a larger difference between highest and lowest predicted probabilities for model 2 (Figures 1 and 2). Both models showed excellent agreement between predicted probabilities and observed proportions for all four outcomes (Figures 1 and 2).

Table 3.   Multivariable associations for model 2 (antepartum and intrapartum characteristics)
CharacteristicIVD-FD vs spont.CS-FD vs spont.IVD-FTP vs spont.CS-FTP vs spont.
Beta*OR (95% CI)*Beta*OR (95% CI)*Beta*OR (95% CI)*Beta*OR (95% CI)*
  1. Spont., spontaneous delivery.

  2. *Shrunken; shrinkage factor 0.98–1.02.

Intercept−12.6 −11.6 −13.6 −15.2 
Antepartum
Maternal age, years0.0301.03 (1.01–1.05)0.0531.05 (1.03–1.08)0.0521.05 (1.02–1.09)0.0571.06 (1.04–1.08)
Gestational age, weeks0.251.28 (1.17–1.41)0.0481.05 (0.96–1.15)0.261.29 (1.14–1.46)0.101.11 (1.02–1.21)
Nulliparous1.906.69 (4.47–10.0)3.0821.7 (12.5–37.8)0.942.57 (1.70–3.89)2.289.78 (6.67–14.3)
Previous caesarean delivery1.635.10 (3.18–8.17)2.3210.1 (5.46–18.8)1.032.81 (1.65–4.79)2.198.91 (5.78–13.7)
Neonatal female gender−0.190.83 (0.66–1.03)−0.260.77 (0.62–0.96)−0.480.62 (0.46–0.84)−0.0250.98 (0.80–1.19)
Birthweight, 100 g increments−0.0610.94 (0.92–0.97)0.0771.08 (1.05–1.11)−0.0770.93 (0.90–0.96)0.111.12 (1.10–1.15)
Maternal diabetes mellitus0.351.41 (0.66–3.01)−0.120.89 (0.39–2.00)0.802.23 (1.06–4.70)0.922.51 (1.49–4.22)
Intrapartum
Induced onset of labour−0.160.85 (0.62–1.16)0.311.36 (0.95–1.95)0.722.06 (1.31–3.25)0.952.59 (1.81–3.70)
Oxytocin augmentation0.0471.05 (0.72–1.53)1.103.00 (2.04–4.41)0.441.56 (0.84–2.89)0.952.58 (1.67–3.98)
Intrapartum fever ≥37.8°C0.401.49 (1.04–2.14)0.641.89 (1.40–2.56)−0.0640.94 (0.53–1.65)0.331.39 (1.04–1.85)
Ruptured membranes >24 hours−0.240.78 (0.54–1.13)0.231.25 (0.93–1.69)−0.720.49 (0.26–0.89)−0.0840.92 (0.67–1.25)
Epidural anaesthesia0.211.24 (0.91–1.70)0.491.63 (1.19–2.25)0.702.01 (1.35–2.99)0.882.41 (1.81–3.20)
Use of ST-analysis−0.0230.98 (0.79–1.21)−0.280.76 (0.61–0.93)0.151.16 (0.87–1.55)−0.0390.96 (0.79–1.16)
Interaction terms
Epidural anaesthesia * oxytocin augmentation−0.0630.94 (0.59–1.50)–0.740.48 (0.31–0.73)−0.110.89 (0.46–1.74)−0.0770.93 (0.60–1.42)
Figure 1.

 Calibration plots of model 1 with the observed risk of IVD-FD (A), CS-FD (B), IVD-FTP (C) and CS-FTP (D) by predicted probabilities of the IVD-FD, CS-FD, IVD-FTP and CS-FTP. The dots indicate deciles of women grouped by similar predicted risk of the different interventions and their indications. The vertical bars through the dots indicate the 95% confidence interval of the observed risks for the grouped women. To enhance interpretation, the axes were adjusted to a scale from 0.0 to 0.50, based on the low observed and predicted outcome incidences.

Figure 2.

 Calibration plots of model 2 with the observed risk of IVD-FD (A), CS-FD (B), IVD-FTP (C) and CS-FTP (D) by predicted probabilities of the IVD-FD, CS-FD, IVD-FTP and CS-FTP. The dots indicate deciles of women grouped by similar predicted risk of the different interventions and their indications. The vertical bars through the dots indicate the 95% confidence interval of the observed risks for the grouped women. To enhance interpretation, the axes were adjusted to a scale from 0.0 to 0.50, based on the low observed and predicted outcome incidences.

The nomograms of both models with an illustrative example are presented in the Supporting information, Appendices S1 and S2, respectively.

Discussion

In this study we developed models to simultaneously predict the probability of instrumental vaginal or caesarean delivery for suspected fetal distress, or failure to progress, and spontaneous delivery based on antepartum characteristics only (model 1) and combining both antepartum and intrapartum characteristics (model 2). Both models showed excellent calibration and good ability to discriminate between women undergoing the different interventions and those who had a spontaneous vaginal delivery. Model 2 showed a slightly better discriminative performance than model 1. As this model included eight additional intrapartum characteristics compared with the antepartum model this difference was to be expected.21

All characteristics used in the models are readily available and easy to measure. Hence, the model will be easy and inexpensive to apply in daily clinical practice. Moreover, the developed nomograms further improve the clinical applicability of the models. The discriminative ability of the model showed only minimal improvement after the addition of intrapartum characteristics whereas the complexity of the model increased. The advantage of the second model is that it includes characteristics known to be associated with the different outcomes and therefore improves the face validity of the model. This is important in view of the future application of the models in clinical practice.35 Obviously, using the second model timely prognostication is impossible because this model includes characteristics that will only be available during labour. The first model, however, does allow for timely prognostication, because it includes antepartum characteristics only. It is important to emphasise, however, that the prognostic models may only complement clinical decision making by combining risk factors in the assessment of the chance of the outcome in an objective, more formal way rather than replacing clinical judgement.

The study population contained large numbers of the outcome categories and therefore allowed for reliable estimation of the predictor effects. Consequently, the optimism found for the two models was small (Tables 2 and 3). Obviously, the caesarean section rate in the Netherlands is lower than in some other developed countries.2,3 Applying the models in these countries could result in an overall underestimation of the caesarean section proportion. Basic adjustment of the intercept could solve this problem.36 However, when the difference in caesarean delivery rate is explained by a different attitude to a subgroup of women (e.g. women with diabetes mellitus) basic adjustment will not suffice and updating of (part of) the regression coefficients is needed. Obviously external validation will be needed to determine the performance of both models in other populations.

We used data from a well-described, large and nationwide cohort of women from a randomised clinical trial in the Netherlands.19 It was a randomised trial, so data collection was standardised resulting in high quality. The cohort included data on labouring women with a singleton term pregnancy in cephalic presentation. As most of the labouring women in the general population are women with a singleton, term pregnancy with the child in a vertex presentation9 we expect the models to generalise to the general population.

Our study aimed to predict both instrumental vaginal delivery and caesarean section using a single model. This is different from other studies that focused mainly on caesarean deliveries.13–18 Furthermore, most of these studies were aimed at predicting only one outcome whereas a multinomial regression model has the advantage that it can estimate several outcomes simultaneously, e.g. vaginal delivery, elective caesarean section and caesarean section during labour,17 and is therefore more informative. A variable is selected as a predictor in the model if it is shown to be predictive for one of the outcomes. As a result, the predictors are not necessarily predictive for all of the different endpoints, i.e. neonatal female gender in CS-FTP (Tables 1–3). Another advantage is that the probability of combined outcomes can be calculated directly. For example, the probability of having an IVD-FD for a nulliparous, non-diabetic, woman of 25 years who is in the 39th week of her pregnancy of a girl with an estimated weight of 2500 g is 13% (see Supporting information, Appendix S1 for a more extensive description on how to calculate this probability). For the same woman we predict a probability of 4%, 5% and 5% for CS-FD, IVD-FTP and CS-FTP, respectively. Consequently, the probability of a spontaneous vaginal delivery is 100 −P(IVD-FD) −P(CS-FD) −P(IVD-FTP) −P(CS-FTP) = 100 − 13 − 4 − 5 − 5  =  73%, whereas the risk of a caesarean section is P(CS-FD)  + P(CS-FTP) = 4 + 5 = 9%. Furthermore, in a multinomial model more women are used to fit the final model than in four separate logistic regression models where the same women are used as a reference several times.

There are some limitations. Potential predictors like body mass index,7,14,17 maternal height,13–15,18 cervical length,13 and amniotic fluid volume11,12 were not taken into account because information on these characteristics was unknown. Although the latter are easy to measure, cervical length changes during pregnancy, which makes it difficult to include in the model. Also, amniotic fluid volume is often compromised by large inter-observer variability.

It is important to note that it is difficult to distinguish between suspected fetal distress and failure to progress in daily clinical practice because they can occur simultaneously. Nevertheless, this is of less importance because interest lies in whether a woman will end up having a problematic delivery or not. Consequently, emphasis may be more on the discriminative ability of the models than on the accuracy of the predictions.

The study population is a mix of two obstetric populations: women who were referred to secondary care before labour, and women who were referred from primary care to secondary care during labour. As the latter group had a lower risk for an intervention before labour (i.e. no reason for a start of labour in secondary care) they could potentially dilute the effect of some of the variables in the antepartum model. To investigate to what extent this mixing influenced model 1 we performed an additional analysis in which we developed the model in women who had risk factors that would lead to a start of labour in secondary care only. The analysis showed results similar to those presented in the Results section (results not shown).

The ultrasound-to-delivery interval37 and the large intra-observer and inter-observer variability38 can comprise the accuracy of estimated fetal weight used in the model, which might lead to an incorrect estimation of the probability of an intervention. Despite this limitation, it remains appropriate for healthcare providers and mothers to consider past and predicted birthweights when making decisions regarding the probability of the interventions, but birthweight alone should not preclude the possibility of an intervention. Furthermore, misestimating the actual birthweight of for example 2500 g with 100 g leads to an erroneous decrease of the probability of IVD-FD of 0.9% only, indicating that an inaccurate estimation will have little impact on the predicted probabilities.

We are aware that caesarean section rates, instrumental vaginal delivery rates, labour induction and the use and dosage of oxytocin augmentation differ from centre to centre and from provider to provider, e.g. by age, experience and years out of training. An obvious way to include these differences would be to account for centres in a multilevel regression model. However, we are unaware of any statistical method that is able to combine multilevel and multinomial modelling. Alternatively, centre effects were investigated using a logistic multilevel model of model 2 whereby we fitted a random intercept per centre with CD-FD as the outcome and spontaneous vaginal delivery as the reference. This analysis showed that the random intercepts of the different centres were significantly different from zero for the majority of centres. To account for these between-centre differences, we have included the centres in the model as dummy variables (centre 1 is the reference). The analysis showed that, in line with the logistic multilevel analysis, several centres were significant predictors of the mode of delivery. When comparing the performance of this model (with centres) with our presented model (without centres) we found that both calibration and discrimination were slightly better for the former model with AUCs of 0.75, 0.75, 0.82 and 0.82 for IVD-FD, CD-FD, IVD-FTP and CD-FTP compared with 0.73, 0.73, 0.80 and 0.81, respectively. However, the increase in performance is only marginally improved given the increased complexity of the model and the loss of generalisability. Therefore we decided to only present the results of the models without adjustment for centres.

Conclusion

In summary, in women with a singleton term pregnancy in cephalic presentation, both antepartum and intrapartum characteristics influence the probability of an instrumental vaginal or caesarean section for suspected fetal distress or failure to progress. Information on the risk of an instrumental vaginal or caesarean section for suspected fetal distress or failure to progress can be of great value in counselling women and guiding labour management. It may allow clinicians to avoid unnecessary interventions in low-risk women and may influence decisions during labour regarding the interpretation of fetal heart rate patterns and the application of additional techniques for fetal monitoring, such as ST-analysis of the fetal electrocardiogram or fetal blood sampling. The nomograms will allow for fast and easy implementation in clinical practice. After external validation and proof of generalisability, these models could be used in obstetric clinical management.

Disclosure of interests

None of the authors had financial, personal, political, academic or other relations that could lead to a conflict of interest relevant to this manuscript.

Contribution to authorship

ES, AK, BWJM, KGGM and RHHG were involved in the conception and the design of the study. ES performed the analyses and wrote the manuscript with significant contributions from AK, HJHMD, JGN, HPO, MGAJW, BWJM, KGGM and RHHG. All authors critically reviewed the subsequent versions of the manuscript and approved the final manuscript.

Details of ethics approval

The original randomised trial (CLINICAL TRIAL REGISTRATION: ISRCTN, http://www.isrctn.org, ISRCTN95732366) from which data were used was approved by the Institutional Review Board of the University Medical Centre Utrecht on 17 November 2005 (http://www.studies-obsgyn.nl/upload/STAN-goedkeuringMETCUMCU-.pdf [Dutch]) and had local approval from all other participating hospitals.

Funding

KGM Moons was funded by The Netherlands Organization for Scientific Research (grant 918.10.615 and 9120.8004). The randomised trial was supported by a grant from the Dutch Organization for Health Research and Development (grant 945-06-557).

Acknowledgements

We thank all research nurses and midwives of the Dutch Obstetric Consortium, as well as the staff of the labour wards of the centres that participated in the Dutch ST-analysis trial for their invaluable contributions to the study.

Ancillary