Predicting successful intended vaginal delivery after previous caesarean section: external validation of two predictive models in a Dutch nationwide registration-based cohort with a high intended vaginal delivery rate




To externally validate two models from the USA (entry-to-care [ETC] and close-to-delivery [CTD]) that predict successful intended vaginal birth after caesarean (VBAC) for the Dutch population.


A nationwide registration-based cohort study.


Seventeen hospitals in the Netherlands.


Seven hundred and sixty-three pregnant women, each with one previous caesarean section and a viable singleton cephalic pregnancy without a contraindication for an intended VBAC.


The ETC model comprises the variables maternal age, prepregnancy body mass index (BMI), ethnicity, previous vaginal delivery, previous VBAC and previous nonprogressive labour. The CTD model replaces prepregnancy BMI with third-trimester BMI and adds estimated gestational age at delivery, hypertensive disease of pregnancy, cervical examination and induction of labour. We included consecutive medical records of eligible women who delivered in 2010. For validation, individual probabilities of women who had an intended VBAC were calculated.

Main outcome measures

Discriminative performance was assessed with the area under the curve (AUC) of the receiver operating characteristic and predictive performance was assessed with calibration plots and the Hosmer–Lemeshow (H-L) statistic.


Five hundred and fifteen (67%) of the 763 women had an intended VBAC; 72% of these (371) had an actual VBAC. The AUCs of the ETC and CTD models were 68% (95% CI 63–72%) and 72% (95% CI 67–76%), respectively. The H-L statistic showed a P-value of 0.167 for the ETC model and = 0.356 for the CTD model, indicating no lack of fit.


External validation of two predictive models developed in the USA revealed an adequate performance within the Dutch population.


After a first caesarean section (CS), a pregnant woman can opt for an elective repeat CS (ERCS) or an intended vaginal birth after caesarean (VBAC) (i.e. a trial of labour), which will result in an actual (successful) VBAC or an emergency CS (unsuccessful VBAC). Discussing the risks of both options is a substantial part of counselling on mode of delivery, and obviously the probability of having an actual VBAC is a key component.[1, 2] Published success rates for VBAC worldwide vary between 60 and 80%.[3] However, these rates are not necessarily applicable for counselling, as individual probabilities may vary due to factors relating to demography, obstetric history and current pregnancy of the woman.[3, 4] Hence, a personalised prediction of VBAC may lead to a more refined counselling. Furthermore, with regard to clinical outcomes, personalised prediction could contribute to risk estimation because actual incidences of major maternal morbidity are lowest in women who have a VBAC (0.2%), followed by women having an ERCS (0.8%), and are highest in women having unsuccessful VBAC (3.8%).[5] In addition, several studies have shown that low probabilities of successful VBAC are related to relatively high risks of major fetal and maternal morbidity.[6, 7] Several scoring models that aim for a personalised prediction of successful intended VBAC have been published.[3, 8] In this work, the predictive models of Grobman et al.[9, 10] are evaluated. These models can be used early in pregnancy[9] and at the onset of labour[10] to estimate the probability of successful intended VBAC during an at-term delivery. Both models have previously been successfully validated in an independent cohort in the USA and were called the ‘entry-to-care model’ (ETC) and the ‘close-to-delivery model’ (CTD).[11, 12] Additionally, the ETC model has been successfully validated for a Japanese population.[13] However, differences in, for example, population characteristics and setting may affect the validity of the predictive models in European countries like the Netherlands. For instance, in the USA, the VBAC rate when the predictive models were derived was 12–22%;[14] this declined to approximately 8.3% in 2007.[15] In most European countries reported VBAC rates are higher, for example 54% in the Netherlands[16] and 30–37% in the UK.[2, 17] Hence, in this study we aim to externally validate the prediction models of Grobman et al.[9, 10] for the Dutch population.



This nationwide registration-based cohort study was performed in 17 hospitals in the Netherlands, with a good representation of all geographic regions and hospital types. Hospital types included university teaching hospitals (n = 5), nonuniversity teaching hospitals (n = 7) and nonuniversity nonteaching hospitals (n = 5). Approval for this study was obtained from the Medical Ethical Committee of the Maastricht University Medical Centre+ (MEC number 09-4-047-13).


The two predictive models were designed for women who have a vertex singleton pregnancy and a history of one low-transverse CS and who delivered at term (≥37 weeks of gestation), therefore only women who met these criteria were included in the present study. As in the studies of Grobman et al.,[9, 10] women with an unknown indication for previous CS, an antepartum intrauterine fetal demise or a contraindication for vaginal delivery were excluded. Contraindications for vaginal delivery were defined as placenta praevia and a uterine scar with extension into the fundus.

We expected a large difference between the VBAC rate in the USA and the current VBAC rate in the Netherlands, so we collected data from women who had an intended VBAC and from women who had an ERCS. The main consideration was to estimate the current VBAC rate in the Netherlands and subsequently to fully evaluate the applicability of the models in the Dutch population by comparing the baseline characteristics of the intended VBAC group and the ERCS group.

Sample size

The sample size was calculated according to the ‘rule of thumb’ of at least ten events per variable in the predictive model.[18] An event was defined as an unsuccessful intended VBAC. Based on previously published data, it was assumed that the percentage of unsuccessful intended VBACs in the Netherlands would be 24%.[16] As the predictive models contained as many as 12 variables per model, the calculated minimum sample size was 500 women (12 × 10/0.24).

Data collection

At all participating sites, data were extracted from consecutive birth records according to a standardised operating procedure by using customised case report forms. Information was obtained on all predictive indicators included in the predictive models. Data were obtained by trained research nurses, medical doctors or senior medical students. To achieve the required sample size, each participating hospital was asked to include 30 consecutive cases of intended VBAC and all ERCSs in the same time interval, starting from 1 January 2010.


Variables were defined as described in the original articles of Grobman et al.[9, 10] The outcome variable used for validating the predictive models was the outcome of the intended VBAC, i.e. either successful intended VBAC (vaginal birth) or unsuccessful intended VBAC (emergency CS).

To be able to incorporate all variables despite the different units and definitions used in the two countries, some of the collected data had to be converted or redefined. All decisions on the conversion and redefinition of variables were approved by both a Dutch and an American obstetrician (HS and WG). The variables that had to be adapted were ‘ethnicity’ (in the USA the categories were African-American/Hispanic/White and others), ‘fetal station’ and ‘cervical effacement’. In the Netherlands seven categories of ethnicity are used (Dutch, other European, Mediterranean, African, Indo-Surinamese, Asian and ‘other’). To correspond to the categories in the original prediction models, the variable ‘African-American’ was set equal to the Dutch variable ‘African’. The variable ‘Hispanic’, did not match any of the Dutch categories and was therefore abolished. Subsequently, the variable ‘White and others’ comprised all Dutch ethnicity categories except for ‘African’. The variable ‘fetal station’ was collected according to the ‘Hodge classification system’, which ranges from Hodge 0 (H0) to Hodge 4 (H4). Fetal station was converted into the American classification system, which ranges from ballottable (B) to +5. It was redefined as follows: H0 = −5; H1 = −3; H2 = −1; H3 = 0; H4 = +3. The variable ‘cervical effacement’ was collected in three categories, namely (1) ≤25%, (2) 25–50% and (3) ≥50%, as these are the categories in which these data are registered in the Netherlands. For analysis we set category 1 equal to 20%, category 2 equal to 40% and category 3 equal to 75%. Furthermore, the variable ‘third trimester body mass index (BMI)’ was approximated by adding 15 kg to ‘pre-pregnancy weight’; this was considered appropriate for the Dutch population based on expert opinion.

Data quality and missing data

Data were entered and checked for inconsistencies. Inconsistent and incomplete data were double-checked directly with the hospital concerned. As shown in Table 1, for most variables there was only a small quantity of missing data. However, prepregnancy BMI was missing in 24% of women. A multiple imputation strategy was used for data analysis, because complete case analysis alone can result in a large loss of power and might yield biased parameter estimates.

Table 1. Baseline characteristics of study cohort of women with a previous caesarean section.
VariableMissing data intended VBAC/ERCD (n/n)Intended VBAC (= 515)ERCS (= 248)P-valuea
  1. B, ballottement; HELLP, HELLP syndrome – haemolysis, elevated liver-enzymes, low platelets; PE, pre-eclampsia; SD, standard deviation.

  2. a

    Results of chi-square tests/Fisher's exact test/t tests.

  3. b

    Not applicable.

Maternal age (years), mean ± SD2/432 ± 533 ± 40.15
Ethnicity, n (%)
Dutch15/12388 (75)192 (77)0.53
Mediterranean37 (7)11 (5)0.14
Other European17 (3)11 (4)0.44
African24 (5)6 (3)0.14
Indo-Surinamese7 (1)1 (0)0.45
Asian12 (2)7 (3)0.68
Other15 (3)8 (3)0.81
Prepregnancy BMI (kg/m²), mean ± SD124/7925 ± 627 ± 7<0.00
Previous CS due to failure to progress, n (%) 0/0201 (39)135 (54)<0.00
Any previous vaginal delivery, n (%) 0/0127 (25)25 (10)<0.00
Previous VBAC, n (%) 0/099 (19)8 (3)<0.00
PE/HELLP, n (%) 0/29 (2)6 (2)0.58
Gestational age at delivery (days), mean ± SD0/0279 ± 8273 ± 9<0.00
Cervical dilatation (cm), mean ± SD11/b3 ± 2 b b
Cervical effacement (%), mean ± SD56/b64 ± 19 b b
Fetal station (B, −5 to +5), mean ± SD57/b−2 ± 2 b b
Induction of labour, n (%) 0/b132 (26) b b

Data analysis

Study cohort characteristics

Characteristics of the women who had an ERCS and women who had an intended VBAC were compared. To compare proportions, the chi-square test, or when appropriate Fisher's exact test, was used. For continuous variables, an independent sample t-test was used for all samples as data were normally distributed. A P value <0.05 was used to indicate statistical significance.

Predicted probabilities

To validate the prediction models, for each woman who had an intended VBAC an individual probability of achieving VBAC was calculated with the following prediction equations obtained from the research articles of Grobman et al.:[9, 10]

  1. The ETC model:[9] inline image, where w = 3.766–0.039 (age, years) −0.060 (prepregnancy BMI) −0.671 (African-American ethnicity) −0.680 (Hispanic ethnicity) +0.888 (previous vaginal delivery) +1.003 (vaginal delivery after previous CS) −0.632 (previous CS due to nonprogressive labour).
  2. The CTD model:[10] inline image, where w =7.059–0.037 (age, years) −0.044 (third-trimester BMI) −0.460 (African-American ethnicity) −0.761 (Hispanic ethnicity) +0.955 (previous vaginal delivery) +0.851 (vaginal delivery after previous CS) −0.655 (previous CS due to nonprogressive labour) −0.109 (estimated gestational age at delivery) −0.499 (hypertensive disease of pregnancy) +0.044 (cervical effacement, deciles) +0.109 (cervical dilation, 0–6 cm) +0.082 (fetal station, B to +5, entered as 0–11) −0.452 (labour induction).

Additionally, using the ETC model we calculated the mean predicted probability of achieving VBAC for women in the ERCS group and compared it with the mean predicted probability in the intended VBAC group. The purpose of this comparison was to evaluate whether the variables of the predictive models are already being taken into account during counselling. As data were not normally distributed, the mean predicted probabilities of achieving VBAC were compared using a Mann–Whitney U-test. We only performed this analysis with the ETC model because the CTD model includes intrapartum factors and was therefore not applicable to the ERCS group.

Discriminative and predictive performance

The discriminative performance of the predictive models was assessed using a receiver operating characteristic (ROC). The ROC was obtained by plotting sensitivity against 1−specificity. Sensitivity was defined as the fraction of VBACs that were correctly predicted for a particular cut-off point, whereas specificity was defined as the fraction of unsuccessful intended VBACs that were correctly predicted. The ability of the models to discriminate between women with a high and low probability of achieving a VBAC was assessed using the area under the curve (AUC) of the ROC. The AUC can vary between 0.5 and 1.0, in which a value of 0.5 represents no discriminative capacity and 1.0 represents excellent discriminative capacity.

The predictive performance of the models was assessed using a calibration curve. The calibration curve was computed to show the relation between predicted probability of achieving VBAC and the observed VBAC rate. To obtain these values, the predicted probability was categorised into quantiles. In each quantile, the mean predicted VBAC rate was calculated and plotted against the observed VBAC rate in the corresponding quantile. In addition, to assessing goodness-of-fit, we computed the Hosmer–Lemeshow (H-L) statistic. The H-L statistic measures the fit of the calibration curve with the assumption (null hypothesis) that observed and predicted values are equal. A P-value <0.05 was considered to show lack of fit of the tested prediction models.

Distribution of probabilities

To determine the clinical utility of the models, we evaluated whether the model could classify a notable portion of women away from the VBAC population mean. Hence, we evaluated the distribution of probabilities among the cohort. The distributions were plotted in bar charts on the x-axis in the calibration plot. Additionally, we computed the percentage of the cohort that can be classified away from the VBAC population mean; we used cut-off values of 60% or less and 80% or higher.


Statistical analyses and plots were performed using SPSS (SPSS v. 18.0; IBM Corporation, New York, NY, USA) software and R, a language and environment for statistical computing.



We reviewed 9833 consecutive medical records of women who had delivered in the participating hospitals since January 2010. One thousand and sixty-eight women (11%) had a history of CS, 763 of whom (71%) met the inclusion criteria. Of these 763 women eligible for intended VBAC, 515 (67%) had an intended VBAC and 248 (33%) had an ERCS. Three hundred and seventy-one women (72%) delivered vaginally, resulting in an actual VBAC rate in our study cohort of 49% (371/763).

Study cohort characteristics

The population distributions with respect to the variables contained in the two predictive models are shown in Table 1. Women who had a previous vaginal delivery and/or a previous VBAC were more likely to attempt a VBAC. Women who had an intended VBAC also had a significantly lower BMI, although the actual difference between groups was small. On the other hand, women with a previous CS due to nonprogressive labour more often opted for ERCS. Women who had an ERCS delivered at a significantly lower gestational age. Based on the ETC model, women who chose a VBAC had a significantly higher mean predicted probability (< 0.00) of successful intended VBAC (72 ± 14%) than women who chose an ERCS (64 ± 14%).

Discriminative performance

The discriminative performance of the predictive models is shown in Figure 1. The ROC of the ETC model has an AUC of 68% (95% CI 63–72%). The ROC of the CTD model has an AUC of 72% (95% CI 67–76%).

Figure 1.

ROC of the entry-to-care model (AUC 68%; 95% CI 63–72%) and the close-to-delivery model (AUC 72%; 95% CI 67–76%), indicating the discriminative performance of both models concerning the probability of a successful vaginal birth after caesarean section.

Predictive performance

The overall calibration of both predictive models was good. The mean successful intended VBAC rate in this study cohort was 72%. The mean predicted probabilities for successful intended VBAC in the ETC and the CTD models were 72 ± 14% and 70 ± 16%, respectively. The predictive performances of the models are shown in the calibration curves in Figures 2 and 3. Both models show acceptable calibration; the calibration in the high-probability ranges was particularly good. The CTD shows better calibration than the ETC model. The H-L statistic showed a P-value of 0.17 for the ETC model and 0.36 for the CTD model, which indicates reasonable calibration of both predictive models.

Figure 2.

Calibration plot of the entry-to-care model with the observed frequency of a successful vaginal birth after caesarean section by the predicted probability. The triangles indicate quantiles of women with a similar predicted probability of success.

Figure 3.

Calibration plot of the close-to-delivery model with the observed frequency of a successful vaginal birth after caesarean section by the predicted probability. The triangles indicate quantiles of women with a similar predicted probability of success.

Distribution of probabilities

The bar charts on the x-axes of Figures 2 and 3 show the distribution of probabilities of successful VBAC among the cohort. Figure 2 shows that when the ETC model is applied, the majority of the cohort has a predicted probability around or above the VBAC population mean of 60–80%. In our cohort, 27% had a predicted probability above the VBAC population mean as these women had a predicted probability of 80% or higher. Furthermore, 19% of the women had a predicted probability below 60%. As shown in Figure 3, with application of the CTD model the distribution of predicted probabilities is also concentrated around and above the VBAC population mean. In total, 31% of women had a predicted probability above 80%, and 26% had a predicted probability below 60%.


Main findings

External validation of two predictive models developed in the USA revealed adequate performance of both models within the Dutch population. Although overall calibration was acceptable, it was particularly good in the range of high predicted probability of successful intended VBAC. Discriminative capacity was reasonable for both models. Most women had a score within the population mean of 60–80%,[3] yet a notable minority was classified away from this population mean. Further, this study shows that in the Netherlands intended VBAC is still common practice, as shown by the intended VBAC rate of 67%. Our results also suggest that preselection already occurs to some extent without applying a model.


According to guidelines on prognostic research, even a predictive model that seems promising requires external validation in different populations and settings.[19] Ethnicity and the probability of attempting VBAC were the main observed differences between the Dutch and American settings, although other factors like intrapartum policy may also exist. We consider our results to be roughly generalisable to most other western European countries with comparable ethnicities and VBAC rates. External validation in our Dutch cohort showed some loss of discriminative performance, as the original AUCs were 75% (95% CI 74–77%) instead of 68% (95% CI 63–72%) for the ETC model and 77% (95% CI 76–78%) instead of and 72% (95% CI 67–76%) for the CTD model. As shown, this finding was more pronounced for the ETC model. These findings are consistent with the previous validation studies performed in an American cohort by Costantine et al.[11, 12] who used a validation method comparable to our study. However, the results contrast with the findings in a Japanese cohort where an AUC of 80% (95% CI 72–89%)[13] was obtained. However, because no information was provided in that article on variable conversion and there appeared to be an additional selection criterion regarding whether women were actually in labour, no actual comparison with our results could be made.

A review by Kaimal and Kuppermann[20] highlighted that most women would like to be involved in decision-making about mode of birth. Also, women expressed their wish for personalised information.[20] Hence, implementation of a predictive model could provide this tailored information by allowing estimation of the risk of emergency CS and the related risk of fetal and maternal morbidity.[6, 7] The ideal predictive model would distinguish between a successful intended VBAC and a failed intended VBAC by polarising the cohort into two groups: women with a very high predicted probability and women with a very low predicted probability of achieving a VBAC. In comparison with other models that predict successful VBAC, the performance of the ETC and CTD models is average to good.[4, 8] However, for decision-making about mode of delivery after previous CS we consider it helpful to also distinguish women with a high or low probability of VBAC from those with an average probability. The ETC and CTD models show the potential to classify a notable portion of women away from the population mean, which might induce better a distribution with regard to risk classification of women among intended VBAC and ERCS. Therefore, we think that a predictive model could not only contribute to more personalised counselling but also to a reduction in fetal and maternal morbidity. However, the actual usefulness of such a model in terms of usability, applicability, change in birth preferences and fetal and maternal morbidity should be further explored in a randomised controlled trial.

Strengths and weaknesses

A strength of this study is that it was performed in a multicentre setting with a good representation of types of hospitals and geographic regions in the Netherlands, which increased the external validity of our results. Furthermore, our data collection provides insight into the current (intended) VBAC rates in the Netherlands and into the prognostic profiles of women who opt for an intended VBAC and ERCS. By performing classification analysis we are able to show the subgroup of women who will have a probability of VBAC that is different from current population means.[1, 2]

We also recognise some limitations to our study. First, there was a possible loss of discriminative performance of the validated models due to the necessary redefinition and conversion of variables into Dutch units. We had to redefine fetal station from a scale consisting of 12 steps ranging from B to −5 to +5 to the Dutch scale that consists of five steps and ranges from 0 to 4. Redefinition could induce misclassification and loss of refinement within variables, and thereby compromise model performance. Furthermore, in both models ethnicity is an important predictor. Though ethnicity has been recognised as an important demographic factor with regard to the probability of successful VBAC,[4] the underlying mechanism is unknown and might be influenced by socio-economic factors. Therefore, the ethnicity categories might not be compatible in other settings. Hence, we recommend re-estimation of ethnicity and the intrapartum variables before application of the models in an impact study or clinical practice. A second drawback is that we had to approximate the variable ‘third-trimester BMI’ as it could not be obtained from the charts. This may have led to imprecision and impairment of the performance of the models. Third, the models would ideally be evaluated through application in a prospective setting. Application of the models might induce different birth preferences in women, selecting women with more favourable prognostic profiles for successful VBAC. This might alter model performance. Furthermore, a limitation with regard to the validated models that must be addressed concerns the timing of counselling. We consider that, from a medical point of view, counselling on mode of delivery should ideally occur in the third trimester of pregnancy, because then other factors can be incorporated that are known in late pregnancy such as estimated fetal weight and whether labour needs to be induced. The ETC model does not take these factors into account whereas the CTD model is applicable when there is an indication for induction of labour or when labour has already started. In this regard, in terms of practical use, a predictive model that can be used in the third trimester would be more suitable for the Dutch setting.


External validation of two predictive models developed in the USA revealed adequate performance of both models within the Dutch setting. The predictive models can classify a notable portion of women away from the VBAC population mean. However, whether women indeed perceive the information on probability of successful VBAC as useful and whether the models hold when applied in a prospective setting should be additionally evaluated. Additional redefinition of the ‘ethnicity’ variable for a western European setting and transformation of both models into one model for third-trimester counselling could enhance model performance and increase applicability to the Dutch setting.

Disclosure of interests

The authors declare that they have no conflict of interest.

Contribution to authorship

RH and HS obtained the funding for the trial. All authors contributed to the protocol and design of this study. AK, BM, RA, KdB, FD, IvD, MF, GK, MK, SK, FL, JS, ESK, HV, FV and MW participated in data collection at participating hospitals. ES and SvK analysed the data. ES, SvK, SM, WG, BM, JN, LS, RH and HS contributed to the interpretation of the results. ES drafted the manuscript with input, critical review and editing from all authors. All the authors accept full responsibility for the overall content of this paper.

Details of ethics approval

The Medical Ethical Committee of Maastricht (azM/UM) declared on 13 April 2011 that no ethical approval was necessary for this study protocol (MEC 09-4-047-13).


This study was funded by ZonMW (no. 17100.3006): The Netherlands Organisation for Health Research and Development.

To TOLAC or not to TOLAC: using individual level variables to predict success in a Dutch population

Mini commentary on ‘Predicting successful intended vaginal delivery after previous caesarean section: external validation of two prediction models in a Dutch nationwide registration based cohort with a high intended vaginal delivery rate’

A woman with a previous caesarean delivery has two options for mode of delivery—either a repeat caesarean delivery or a trial of labour after caesarean delivery (TOLAC). To improve counselling and optimise success in those who choose TOLAC while attempting to minimise morbidity, studies have been published evaluating prediction models to help predict success after TOLAC.

Based on a population of women who received care at the hospitals within the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Maternal–Fetal Medicine Units Network (MFMU), Grobman et al. (Obstet Gynecol 2007;109:806–12; Am J Perinatol 2009;26:693–701) developed two calculators that predict the probability of a successful vaginal birth after caesarean (VBAC). The first calculator takes into account the maternal demographic characteristics and historic characteristics that are present from the time a woman enters prenatal care, including body mass index, African American race, Hispanic ethnicity, previous vaginal delivery, indication for previous caesarean delivery and previous vaginal delivery since caesarean delivery. However, other factors, such as induction of labour, gestational age, hypertensive disease, and cervical examination are critical factors that also influence success (and therefore morbidity), and these latter factors are incorporated into the second calculator.

This study by Schoorel et al. sought to externally validate the two US model calculators within 17 hospitals in the Netherlands to ensure their applicability in a population that is distinct from that within which they were developed. The Netherlands has a significantly higher TOLAC rate than the USA, as well as different maternal demographic characteristics necessitating the validation of these previously published calculators. This well performed external validation demonstrated adequate performance of both calculators in this population. Additionally, the authors calculated an intended TOLAC rate as well as an overall TOLAC rate based on all eligible women, confirming the high TOLAC rate in the Dutch population. This is the first study to externally validate these previously published calculators in a European population.

In the era of rising caesarean rates and more women facing a choice of whether to attempt a TOLAC, the development, utilisation and validation of models that help with counselling, predicting success and minimizing morbidity with TOLAC are valuable. However, one wonders how beneficial TOLAC success calculators will be in counselling individual women as to their overall risk and assisting with decision making. Individual perceptions of risk vary widely. Studies assessing the utility of these calculators are needed to show that these calculators improve counselling efforts and have practical benefit.

Disclosure of interests

The author has no conflicts of interest to disclose.

  • SK Srinivas

  • Department of Obstetrics and Gynecology, Division of Maternal Fetal Medicine Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA