- Top of page
Induction of labor for medical or elective indications is a common procedure in current obstetrics. In the USA, the rate of labor induction increased gradually from 10 to 21% for all term births between 1990 and 2002. In the UK in 2005, one in five deliveries was induced.
Induction of labor is believed to be associated with an increase in the need for Cesarean delivery for both nulliparous and parous women. Cesarean delivery not only carries operative risks in the index pregnancy, but also increases risks for future pregnancies. The continuous rise in rates of Cesarean section gives cause for concern to both obstetricians and policy makers.
Recent randomized comparisons have shown that the effect of induction of labor on the risk for Cesarean section is limited; in pregnancies in which either the mother or the fetus is at risk for complications, induction of labor is the preferred policy over expectant management[6-9].
To date, the Bishop score remains the standard method for predicting the duration and safety of induced labor. However, a recent systematic review showed that the Bishop score was a poor predictor of the outcome of labor in women scheduled for induction. Over the past decade, pre-induction cervical length, as determined by transvaginal sonography, has been proposed as a predictor for cervical ripeness. Initial changes at the internal os of the cervix can be observed by transvaginal sonography, even in the absence of cervical dilatation. It has been suggested that measurement of cervical length before induction of labor can be used to assess the risk for Cesarean section.
Boozarjomehri et al. evaluated the association between transvaginal ultrasound assessment of the cervix and the outcome of labor induction. They concluded that ultrasound assessment of cervical factors, such as wedging, may be helpful in identifying patients who will have a successful labor induction despite an unfavorable digital examination. Since the publication of this article, many studies have been published on the subject.
Although transvaginal ultrasonographic cervical measurement is quantitative, reproducible and easy to learn[13-16], studies demonstrate conflicting results, as some have reported cervical length to be predictive of successful labor induction[12, 17-19], while others have not found this association[20-22]. The aim of this study was to systematically review the literature on the prognostic capacity of cervical length for the outcome of induction of labor in pregnant women at term.
- Top of page
Figure 1 summarizes the literature identification and selection process. The computerized MEDLINE, EMBASE and Cochrane search detected 608 publications. After reading titles and abstracts, 502 of them were excluded, leaving 106 articles for detailed reading. From the cross-references, three more studies were identified and selected for further reading, giving 109 studies. Of these, 78 had to be excluded for various reasons. The main reasons for exclusion of studies were lack of information to calculate 2 × 2 tables (n = 11), no measurement of cervical length (n = 17) and a different spectrum of patients (n = 11). For nine pairs of papers reporting on the same cohort we included the paper with most complete data. We contacted the authors of 14 articles with insufficient data to construct a 2 × 2 table. Three of them (Park, Meijer-Hoogeveen et al., Rozenberg et al.) provided us with the required data, resulting in an overall inclusion of 31 articles in the meta-analysis[17-19, 21, 22, 26-51].
Figure 1. Flow-chart showing results of electronic search of MEDLINE, EMBASE and Cochrane Library of Systematic Reviews.
Download figure to PowerPoint
Of these 31 studies, 30 were cohort studies and one study was a randomized controlled trial. All studies were prospectively designed. Eight of the 31 articles[21, 32, 36, 39, 43, 45, 46, 50] provided data both on cervical length before induction of labor and on the presence of cervical wedging in relation to successful induction of labor, whereas the other 23 only provided data on cervical length. The studies reported on a total of 5029 women undergoing induction of labor, of whom 1153 (23%) were delivered by Cesarean section. The Cesarean section rate in the studies varied from 11 to 60%, and the number of women analyzed in the cohort studies ranged from 43 to 460.
Figure 2 summarizes the results of the study quality assessment. In two studies, labor was induced in some of the women before 37 weeks' gestation, whereas 29 studies explicitly reported inclusion of singleton pregnancies in the term period (37–42 weeks) with cephalic presentation. Indications for induction were described in almost all studies. Blinding to the results of the ultrasound was described in 23 (74%) of the studies. No study reported on uninterpretable results.
Several definitions for the outcome of successful induction were used in the studies. The occurrence of Cesarean delivery was used as the primary endpoint of most studies. Other frequently used endpoints were ‘not achieving vaginal delivery within 24 h after the start of induction’ and ‘no achievement of the active phase of labor’.
Only one study described a prespecified cut-off value for cervical length. Most studies reported several cut-off values within a range of 10 to 46 mm, allowing for multiple 2 × 2 tables to be extracted from one study.
From the 31 studies, we were able to construct 69 2 × 2 tables for the endpoints ‘Cesarean delivery’ (43 tables reporting on 22 studies and 3932 women), ‘not achieving vaginal delivery within 24 h’ (17 tables reporting on seven studies and 886 women) and ‘not achieving active labor’ (nine tables reporting on two studies and 211 women) assessing cervical length, and eight assessing cervical wedging (77 2 × 2 tables in total).
Taking into account the heterogeneity in endpoints, we analyzed data per specific outcome. Figure 3 shows the sensitivities and specificities of cervical length of the included studies for these different endpoints. For the prediction of Cesarean delivery, sensitivity ranged from 0.14 to 0.92 and specificity from 0.35 to 1.00. For the endpoint ‘no vaginal delivery within 24 h’, sensitivity ranged from 0.17 to 0.87 and specificity from 0.57 to 0.98.
Figure 3. Forest plots showing accuracy of cervical length measurement for prediction of: (a) Cesarean section, (b) no vaginal delivery within 24 h and (c) not achieving active phase of labor. Only the first author of each study is given. Sensitivity and specificity given with CIs. FN, false negative; FP, false positive; TN, true negative; TP, true positive.
Download figure to PowerPoint
For the prediction of Cesarean delivery, the summary point estimates of sensitivity and specificity combinations were 0.82 (95% CI, 0.73–0.88) and 0.34 (95% CI, 0.24–0.45) for a cervical length of 20 mm, 0.64 (95% CI, 0.47–0.78) and 0.74 (95% CI, 0.63–0.82) for a cervical length of 30 mm and 0.13 (95% CI, 0.07–0.24) and 0.95 (95% CI, 0.89–0.98) for a cervical length of 40 mm, respectively (Figure 4). Corresponding positive (LR+) and negative (LR−) LRs were 1.2 and 0.53 for a cervical length of 20 mm, 2.5 and 0.49 for a cervical length of 30 mm, and 2.6 and 0.92 for a cervical length of 40 mm.
For the prediction of vaginal delivery not occurring within 24 h the summary point estimates of sensitivity and specificity combinations were 0.58 (95% CI, 0.46–0.69) and 0.80 (95% CI, 0.70–0.87) for a cervical length of 25 mm and 0.84 (95% CI, 0.69–0.92) and 0.60 (95% CI, 0.47–0.71) for a cervical length of 32 mm (Figure 5). Corresponding LR+ and LR− were 2.9 and 0.53 for a cervical length of 25 mm and 2.1 and 0.27 for a cervical length of 40 mm.
Only two studies reported on the outcome ‘not achieving the active phase of labor’[45, 50]. Roman et al. reported results for different cut-offs for cervical length, whereas Yang et al. reported results only for a cut-off of 31 mm. In the study of Roman et al., the sensitivity and specificity combination for a cut-off of 30 mm was 0.56 and 0.66 and for the study of Yang et al. this was 0.83 and 0.75. We did not calculate summary point estimates of sensitivity and specificity for the outcome ‘not achieving the active phase of labor’.
Eight studies (involving 1139 participants) reported on the presence or absence of wedging before induction of labor. A plot of sensitivity–specificity points for the presence of wedging for failed induction of labor is shown in Figure 6. Sensitivity varied from 0.12 to 0.61 and specificity from 0.63 to 0.91. For the prediction of failed labor induction, summary point estimates of sensitivity and specificity were 0.37 (95% CI, 0.26–0.49) and 0.80 (95% CI, 0.71–0.87), respectively (Figure 7). The likelihood ratio of a positive test was 1.9 (95% CI, 1.2–2.85) and the likelihood ratio of a negative test was 0.79 (95% CI, 0.65–0.96).
Figure 6. Forest plot showing accuracy of presence of wedging for prediction of failed induction of labor. Only the first author of each study is given. Sensitivity and specificity given with CIs. FN, false negative; FP, false positive; TN, true negative; TP, true positive.
Download figure to PowerPoint
Five studies included only nulliparous women. We performed a subgroup analysis of this small group. In nulliparous women, sensitivity and specificity for a cervical length with a cut-off of 30 mm were 0.70 (95% CI, 0.61–0.78) and 0.74 (95% CI, 0.69–0.79), respectively, for the prediction of Cesarean delivery. Corresponding LRs were 2.7 for a positive test result and 0.40 for a negative test result.
- Top of page
This systematic review assessed whether transvaginal ultrasonographic assessment of the cervix can be used as a predictor for successful induction of labor. We included 31 studies reporting on 5029 women. The sensitivity and specificity combinations of cervical length in the prediction of outcome of induction of labor showed that a long cervix and absence of wedging in general double the odds of failed induction, whereas a short cervix and wedging decrease the odds of failed induction by approximately 50%.
We assessed the quality of the included studies as mediocre. This was mainly owing to the fact that almost no study reported withdrawals and no study reported uninterpretable results of the cervical assessment. Since measuring cervical length is easy to perform and easy to learn, even for inexperienced investigators, a clear image of the cervix can be obtained in nearly 100 % of cases. This could be an explanation for not reporting withdrawals or uninterpretable results, but unfortunately, most studies did not mention these issues explicitly.
Our review has some limitations. Most of the studies provided data on parity, but these data could not be correlated to the outcome of induction. The latter was only possible for the five studies that included only nulliparous women. The same applies to the method of induction and to the indications for Cesarean delivery. Despite this, we chose to include all studies, despite heterogeneity caused by indications for induction of labor or mode of induction.
We are aware of one other review that analyzed cervical length as a predictor of the outcome of induction of labor in patients at or beyond term. This review, conducted by Hatfield et al., included studies published before October 2006. In our review, we included an additional 12 articles published after 2006, which seems quite a lot after the publication of a systematic review on the subject. Our results for cervical length are in line with that study. However, unlike Hatfield et al. and Boozarjomehri et al., we did not find that cervical wedging is a good predictor of the outcome of induction, mainly owing to its low sensitivity.
In nulliparous women, it seems that cervical length with a cut-off of 30 mm best classifies women at high risk for Cesarean delivery, with a sensitivity of 0.70 and a specificity of 0.74, and corresponding likelihood ratios of 2.7 for a positive test and 0.40 for a negative test result.
No consensus has been reached regarding the diagnosis of failed labor induction. A variety of endpoints have been suggested, including Cesarean delivery, not achieving a vaginal delivery within a specified time (such as 12 or 24 h), not achieving active labor within a specified time, or failure to achieve the active phase of labor. Despite the heterogeneity in these outcome measures, we decided to include all studies providing at least one of these outcomes; however, we also decided to analyze data by endpoint.
The challenge with induction of labor is to identify patients in whom induction will be successful and patients in whom induction will fail. Identification of women at high risk for Cesarean delivery following induction of labor is of the utmost clinical importance.
A systematic review to assess the ability of the Bishop score to predict the mode of delivery in women scheduled for induction of labor showed poor accuracy of a Bishop score lower than 6 in the prediction of Cesarean delivery (LR+, 1.35 and LR–, 0.18). With a sensitivity of 0.78 and a specificity of 0.44, this test performs virtually as well as cervical length (sensitivity and specificity for the prediction of Cesarean delivery, 0.82 and 0.34, respectively).
The use of predictive tests in the decision to induce or not also depends on the need for delivery in a short time. For example, in women with pre-eclampsia at term, delivery is needed either along the vaginal route after induction of labor or, when this fails, by Cesarean section. However, when the indication for delivery is less strong, for example in women with a healthy pregnancy at 41 weeks, this balance might be different. Also, one should realize that an unripe cervix, as indicated by a long cervix or absence of wedging, indicates not only a higher risk for failed induction, but also a longer time to the onset of spontaneous labor. Thus, when the need for immediate delivery driven by the condition of the mother or child is more urgent, there is greater emphasis on the correct interpretation of tests to predict failed induction. In fact, analysis of the importance of cervical length measurement in the HYPITAT trial indicated that in women with hypertensive disease and an unripe cervix, induction of labor is more desirable than in women with a ripe cervix, since the first group may be harmed by expectant management.
A limitation of the present meta-analysis is that we could only assess the accuracy of cervical-length measurement and wedging in a univariate way. As studies in which the additional value of cervical length are lacking, we cannot at present answer the question as to whether sonographically measured cervical length adds to the information provided by the Bishop score. Studies with a multivariable approach are needed to combine factors such as method of induction (prostaglandins vs oxytocin vs Foley catheter or double balloon), gestational age or parity in a prediction model. Individual patient data meta-analysis is a next step that will allow assessment of the relative contribution of each of these factors.
In summary, measuring cervical length and assessing cervical wedging before induction of labor, which are easily performed, have limited value in predicting the outcome of labor. They might help to shift the decision in an individual patient on induction or not, but in general their test accuracy is too limited to justify routine use in clinical practice.