Diagnosis or prognosis? An umbrella review of mid‐trimester cervical length and spontaneous preterm birth

Abstract Background Cervical length is widely used to assess a woman's risk of spontaneous preterm birth (SPTB). Objectives To summarise and critically appraise the evidence from systematic reviews on the prognostic capacity of transvaginal sonographic cervical length in the second trimester in asymptomatic women with singleton or twin pregnancy. Search strategy Searches were performed in Medline, Embase, CINAHL and grey literature from 1 January 1995 to 6 July 2021, including keywords ‘cervical length’, ‘preterm birth’, ‘obstetric labour, premature’, ‘review’ and others, without language restriction. Selection criteria We included systematic reviews including women who did not receive treatments to reduce SPTB risk. Data collection and analysis From 2472 articles, 14 systematic reviews were included. Summary statistics were independently extracted by two reviewers, tabulated and analysed descriptively. The ROBIS tool was used to evaluate risk of bias of included systematic reviews. Main results Twelve reviews performed meta‐analyses: two were reported as systematic reviews of prognostic factor studies, ten used diagnostic test accuracy methodology. Ten systematic reviews were at high or unclear risk of bias. Meta‐analyses reported up to 80 combinations of cervical length, gestational age at measurement and definition of preterm birth. Cervical length was consistently associated with SPTB, with a likelihood ratio for a positive test of 1.70–142. Conclusions The ability of cervical length to predict SPTB is a prognostic research question; systematic reviews typically analysed diagnostic test accuracy. Individual participant data meta‐analysis using prognostic factor research methods is recommended to better quantify how well transvaginal ultrasonographic cervical length can predict SPTB.


| I N TRODUC TION
Preterm birth (before 37 weeks of gestation) is the leading cause of neonatal mortality worldwide, and the secondleading cause of death in children under five. 1 Survivors are at increased risk of a range of respiratory, sensory and neurodevelopmental disorders, 2 obesity and cardiovascular disease. 3 Although survival and developmental outcomes of children born preterm have improved due to advances in neonatal care, progress in the prevention of spontaneous preterm birth (SPTB) has been relatively limited. 4 A shortened cervix in the second trimester of pregnancy has been recognised as a risk indicator for SPTB for more than 30 years, 5 but the advent of transvaginal ultrasound provided a more reliable measure. 6 Despite a multitude of prognostic studies, the predictive capacity of a cervical length measurement remains unclear because of varying findings among different study populations and conflicting definitions of short cervix and preterm birth. [7][8][9] It is known that risk of SPTB increases as cervical length decreases, 7 but even so, the majority of women with a short cervix will go on to deliver at term. 10 This may explain, in part, the discrepancy in clinical guidelines between different countries and the cautiousness of their recommendations. [11][12][13][14][15] A clinician would ideally be able to use the cervical length to help stratify a woman's risk of SPTB and to plan further surveillance or selectively offer preventive treatments (vaginal progesterone, cerclage or pessary) to reduce that risk.
The volume of literature is such that numerous systematic reviews have been published, attempting to guide antenatal care providers in the clinical application and predictive utility of transvaginal ultrasound cervical length. However, the number of review articles is also very large, with variable quality and scope, which does little to achieve the stated aim. A contemporary approach to synthesising the large amounts of information available and providing clear guidance on important topics in health care is to perform an overview of the existing systematic reviews, or umbrella review. 16,17 We conducted this umbrella review to summarise and critically appraise published systematic reviews assessing the value of transvaginal ultrasound cervical length in predicting SPTB in asymptomatic women with singleton or multiple pregnancy in the second trimester who had not received prophylactic treatment to reduce their SPTB risk. We aimed to use the outcome to suggest optimal clinical application of cervical length measurement and future directions for research.

| Protocol and registration
The protocol of this overview of systematic reviews was registered with PROSPERO (CRD42020138502) and the reporting is in line with the PRISMA statement. 18

| Patient and public involvement
Patient and public involvement was not sought as part of this review.

| Core outcome sets
No core outcome set could be used in this review because of the scope of the research question and the analysis of literature that often pre-dated the existence of a relevant core outcome set. 19

| Information sources and search strategy
The search strategy was developed in consultation with a specialist librarian and was applied without language restrictions. The search key terms included: cervix or cervical, uterine cervical incompetence, cervical length measurement, ultrasonography, preterm birth or delivery or labo(u)r, and review. Details of the search strategy are presented in Appendix S1. We searched Medline, Embase, CINAHL and LILACS databases from 1 January 1995 to 6 July 2021. In addition, we searched the Cochrane database, PROSPERO register, JBI Database of Systematic Reviews and Implementation Reports, Database of Abstracts of Reviews of Effects and Google Scholar for grey literature. We performed citation tracking on all reviews.

| Eligibility criteria and study screening
We included systematic reviews of asymptomatic pregnant women in their second trimester with a singleton or twin pregnancy, with or without additional risk factors for SPTB, who underwent transvaginal ultrasound cervical length measurement but did not receive preventive treatments. Systematic reviews evaluating the prognostic value of transvaginal ultrasound cervical length, either alone or as part of a wider research question, were eligible. Systematic reviews were defined as those with explicit intent 'to identify appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a specific research question'. 20 We searched beyond 1995 with no language restrictions applied. We excluded systematic reviews presented as conference abstracts only, clinical practice guidelines and narrative reviews. We excluded systematic reviews that were unable to report on the presence of symptoms of preterm labour, and those that were unable to report on whether preterm births were spontaneous or iatrogenic. We excluded systematic reviews where cervical length was measured by transabdominal, translabial or transperineal routes because of the lack of reliability of these methods. [21][22][23] We also excluded systematic reviews where the cervical length measurement resulted in the use of treatments to reduce the risk of SPTB. Grey literature was eligible for inclusion if meeting the criteria for a systematic review and if complete text was available.
Studies were screened by title and abstract by two reviewers (KH, RW). Initial screening aimed to identify reviews of any kind that examined the predictive utility of transvaginal ultrasound cervical length in asymptomatic pregnant women in the second trimester. Full-text review was performed by two investigators (KH, HF) independently. Disagreements were resolved by consultation with a third reviewer (BWM or RW), or by consensus.

| Data extraction
Data were extracted independently by two reviewers (KH, HF), using a form based on the Johanna Briggs Institute data extraction form 24 (Appendix S2). The data items include number of participants, type of population (inclusion/ exclusion criteria), details on the exposure (cervical length measurement, including gestational age at measurement and definition of short cervix [in mm]), details on the outcome (definition of SPTB [in weeks] and summary statistics on the outcomes) and methods for data synthesis. Cervical length measurements during the first and third trimester are beyond the scope of this review and therefore these data were not extracted.

| Risk of bias assessment
ROBIS (Risk Of Bias In Systematic reviews) 25 was used as the primary tool for risk of bias assessment and was performed independently by two reviewers (KH, HF). ROBIS assesses the following domains: study eligibility, identification and selection of studies, data collection and study appraisal, synthesis and findings. AMSTAR-2 (A MeaSurement Tool to Assess systematic Reviews-2) was also used as a supplementary tool, assessing for use of ideal research methods in systematic reviews that include non-randomised studies, including research question components, use of a prospectively prepared research protocol, literature search strategy, study selection and data extraction in duplicate, reporting of funding sources and several more. The 'overall confidence rating' derived from AMSTAR-2 26 was applied to each review.

| Data synthesis
The key characteristics of systematic reviews, including design, participants, prognostic factor of interest (gestational age at measurement of cervical length, cervical length cutoffs), outcomes, timing of prediction, sample size and effect measures were summarised and tabulated descriptively. Summary statistics of different systematic reviews were tabulated and visualised, noting that the unit of analysis was a systematic review instead of a primary study and therefore data from primary studies were not re-extracted for synthesis. Results across different systematic reviews that measured the same populations and used matching cutoffs for gestational age at measurement, short cervical length and definition of SPTB were also summarised.

| Dealing with overlapping studies
Given the aim was to provide an overview of all the available systematic reviews on this topic, we decided to include all relevant systematic reviews including overlapping primary studies. 27 We mapped the included studies in different systematic reviews in a league table (Appendix S3).

| Study selection
The PRISMA flow diagram is presented in Figure 1. 28 The search yielded 2475 items in total, of which 1569 were excluded after removal of duplicates and screening titles and abstracts. The remaining 161 full-text reviews were assessed for eligibility. One hundred and forty-seven were excluded for the following reasons: 113 were narrative reviews, 11 had a different research question, ten were editorials or commentaries only, four were clinical practice guidelines, one was an incomplete draft of a government-commissioned review, and one performed a qualitative overview of reviews assessing both cervical length and fetal fibronectin and, because of its earlier publication date, only contained two relevant systematic reviews (also in our search results) and did not contribute any additional data. A list of the excluded reviews is available in Appendix S2. Table 1 summarises the key characteristics of the 14 systematic reviews, [29][30][31][32][33][34][35][36][37][38][39][40][41][42] including design, participants, prognostic factor of interest (gestational ages at measurement of cervical length, cervical length cutoffs, outcomes), timing of prediction, sample size and effect measures for meta-analysis. Two systematic reviews did not include meta-analysis. 39,40 Of the 12 systematic reviews with meta-analysis, two 38,41 were based on individual participant data, with cervical length as a prognostic factor; the other ten were based on aggregate data, considering cervical length as a diagnostic accuracy test. Eight assessed asymptomatic women only and six addressed both symptomatic and asymptomatic women; separate analysis of patient groups within these papers allowed us to consider only the data relevant to our research question. Five included singleton studies only; four included twin studies only, and five reported on both singleton and twin pregnancies. Thirteen systematic reviews assessed primary studies that used a single transvaginal measurement of cervical length and the other evaluated the change in cervical length over time. 35 Systematic reviews included between 6 and 23 primary studies, reporting data on 1312-26 474 participants. The ten aggregate data meta-analyses performed multiple analyses, reporting from 3 to 80 combinations of cutoffs (gestational age at measurement, cervical length and gestational age at delivery), which summarised data from between one and nine studies (75 and 6047 participants) per combination, as outlined in Table 2.

| Characteristics of included systematic reviews
Cervical length was measured between 12 and 30 (or more) weeks of gestation. This wide variation in gestational age at measurement was most commonly addressed by reporting summary statistics for a gestational age range; however, one group calculated mean gestational ages at measurement. 32 Up to 22 different gestational ages (or age ranges) at cervical length measurement were reported in the primary studies included in a single review. 35 A variety of cutoffs (ranging from 5 to 60 mm) were used for defining a short cervix, with 20 mm (n = 9), 25 mm (n = 10) and 30 mm (n = 7) the most used. Up to 23 different cutoffs were reported in the primary studies included in a single systematic review (Table 1). 36 Definitions of spontaneous preterm birth (the primary outcome) also varied among the included studies, with up to seven thresholds reported per review. 33 The most common cutoffs were less than 34 and less than 37 weeks of gestation.
Of the few studies using the same statistical analysis methods that also reported similar cutoffs for cervical length and SPTB, Lim et al. 37 and Conde-Agudelo et al. 42 reported comparable results, as did Lim et al. 37 and Conde-Agudelo et al. 36 (Table 2), however gestational age at measurement was not specified in the paper by Lim et al. because of limitations of the methodology. The similar findings may be explained by the proportion of overlapping studies, shown in Appendix S4; two groups re-reported their own data in later publications. 30,31,35,42 Due to heterogeneous reporting in the primary studies, between 2 and 13 studies were excluded from meta-analyses of aggregate data. For the two individual participant data meta-analyses, 11 of 23 and 7 of possibly 247 (number not clearly specified) eligible studies were included due to inability or unwillingness to share data.
Among the ten systematic reviews of diagnostic test accuracy, the most reported statistics were summary likelihood ratios (n = 7), summary receiver operating characteristic (ROC) curves (n = 7) and summary sensitivities and specificities (n = 5). Three reviews performed bivariate metaanalysis. One plotted each study's reported sensitivity and specificity in the style of an ROC curve, without generating a summary ROC curve. 43

| Risk of bias assessments
Results from risk of bias assessment are shown in Figure 2. Only four of 14 reviews were assessed as having a low overall risk of bias overall with ROBIS, six reviews were rated at high risk of bias and four had an unclear risk of bias. Eight of 14 systematic reviews performed well in ROBIS domains of identification and selection of studies, and seven in study eligibility criteria. AMSTAR-2 results are available in Appendix S5.

| Single cervical length measurement and preterm birth in singleton pregnancies
Based on four systematic reviews of women, the likelihood ratio of a positive test (LR+) for cervical length of 25 mm or less before 20 weeks of gestation (except for Domin   Note: Cells are blank where a summary statistic was not reported. Italicised numbers were not reported in the original paper, but calculated from data in the review. Abbreviations: AUC, area under the curve; CL, cervical length; GA, gestational age; LR, likelihood ratio; PTB, preterm birth. or 35 weeks were 4.31-13.38 and the likelihood ratio of a negative test (LR−) was 0.65-0.80. For preterm birth before 32 weeks of gestation, the LR+ from two systematic reviews was 3.18-4.10 and the LR− was 0.75. Additional common combinations of thresholds are shown in Table 2 and Appendix S2.

| Single cervical length measurement and preterm birth in twin pregnancies
Three systematic reviews showed that for cervical length of 25 mm or less measured at 20-24 weeks of gestation predicting preterm birth before 34 weeks, the LR+ was 5.02-6.00 and the LR− 0.65-0.75, sensitivity 36-40% and specificity 93-94%. Details of additional results are summarised in Table 2 and Appendix S2.

| Cervical length change and preterm birth
Conde-Agudelo et al.'s most recent review 35 examined change in cervical length over time as a diagnostic test, reporting on 13 different combinations of variables. The extent of cervical shortening was substituted for cervical length: 'any shortening' over the study period, shortening to a threshold, or a percentage shortening. Gestational age at measurement F I G U R E 2 ROBIS traffic light and summary plot. encompassed wide ranges (10-28 weeks at initial measurement through to 20-30 weeks at final measurement). The group described, for women with twin pregnancy, 47% sensitivity, 88% specificity and LR+ 4.00 of any shortening of cervical length for predicting SPTB before 34 weeks. Findings were similar for 20-25% shortening in a similar population (47% sensitivity, 87% specificity and LR+ 3.80). An earlier review by the same authors listed a range of findings for any cervical shortening (15-75% sensitivity, 70-90% specificity, LR+ 1.60-5.50, LR− 0.30-0.80) predicting SPTB between less than 28 and 36 weeks. 42

| Main findings
Cervical length was consistently associated with SPTB, but the LR+ was between 1.70 and 142 depending on the cutoffs used. Using the second-trimester transvaginal ultrasound to predict SPTB is a prognostic research question as opposed to a diagnostic question, as SPTB is a future outcome, not detection of a condition present at the time of measurement. However, of the 14 included systematic reviews, over 85% reported the research question as a diagnostic accuracy test instead of a prognostic question, and over 70% had a high or unclear risk of bias. Included meta-analyses reported up to 80 combinations of cutoffs of cervical length, gestational age at measurement and definition of preterm birth. Consequently, transvaginal ultrasound showed variable degrees of association with SPTB.

| Clinical and research implications
We have identified several issues in the current literature that could be improved in the future. First, most systematic reviews considered the research question as a diagnostic, instead of a prognostic question. Therefore, confounding could not be accounted for in the analysis and the reported predictive value of cervical length might reflect the influence of other factors instead of cervical length itself. Guidance on prognosis research, including the PROGRESS framework, [44][45][46][47] should be followed in future studies. Second, the preponderance of narrative reviews among those published in the past two decades, an issue likewise observed in other areas of medicine. 48 Although the limitations of narrative reviews are wellacknowledged, 48,49 they are frequently the basis of recommendations for clinical practice. Third, overall risk of bias in the included systematic reviews was high or unclear in the majority, and also in many assessment domains, perhaps due to word count restrictions and insufficient reporting in primary studies. [44][45][46][47] Lastly, we observed up to 80 combinations of cutoffs of cervical length, gestational age at measurement and definition of preterm birth in included meta-analyses. Dichotomisation of continuous variables results in a loss of data 50 and makes comparison of findings across studies difficult. Statistical analysis plans are best made in conjunction with biostatisticians, and cervical length should be ideally treated as a continuous variable in analyses.
Recommended prognosis research methodology includes reporting of prognostic effect measures (hazard or odds ratios) instead of diagnostic effect measures (sensitivity and specificity), and adjusting for other potential prognostic factors. 51 In addition, as mentioned above, clinicians are urged to avoid dichotomising variables for simplicity or convenience due to the loss of data that ensues. 50 The importance of gaining additional days of gestation, especially in extreme prematurity for example, is not adequately reflected by simply dichotomising data into 'preterm birth less than 37 weeks of gestation' or 'term birth'. We propose to treat the outcome SPTB as a time-toevent outcome instead of a binary outcome so that SPTB at different gestational ages can be differentiated in the analysis.
A single prognostic factor is often insufficient to accurately determine a person's risk; 45 most reviews appreciated this in their findings. Prognostic models, if carefully developed, calibrated and externally validated, may be more useful in practice. 45 However, to date, multiple-marker prediction models have not proved overly successful in predicting SPTB, 52,53 and are therefore not widely used in clinical practice, leaving the clinician with few evidence-based options for risk assessment.

| Strengths and limitations
This overview of systematic reviews is underpinned by a broad, well-designed literature search, and adheres to PRISMA guidelines. We offer novel insights into the limitations of study design and statistical methods previously used in this literature. A potential limitation in this overview was that title/abstract screening was performed by only one reviewer (however, a low threshold was used to proceed to full-text review), although it was unlikely that eligible systematic reviews were missed given our comprehensive search strategy and citation tracking. Significant overlap between included primary studies was observed (some authors used the same set of studies across two reviews), 30,31,36,42 and although this is acknowledged in our results, there is no agreed method for dealing with this issue.

| Interpretation
The literature assessed in this review reports a broad spectrum of possible outcomes in women with a short cervix. The likelihood ratios may be interpreted as indicating a woman with a 'short' cervix is between 1.70 and 142 times more likely to develop the condition (SPTB) as a woman with a 'long' cervix, depending on the thresholds used. However, these are very imprecise figures that cannot be directly applied in clinical practice. Furthermore, this assumes that a diagnostic measure may be repurposed as a prognostic indicator. Now that prognostic factor research methods have been more completely described, we recommend quantifying risk with these tools. [44][45][46][47] This can be applied to other areas of research in perinatal medicine, such as the evaluation of preventive treatments to reduce SPTB risk. Hypothetically, a treatment that prolongs gestational age from 32 to 34 weeks will be discarded if the outcome is a binary outcome defined as SPTB before 37 weeks of gestation, but this 2-week period will be captured when the outcome is considered as a timeto-event outcome.
Given that many studies have already been conducted in women with different risk profiles, rather than abandoning these and simply calling for more high-quality studies, we would advocate for using this existing work by performing individual participant data meta-analysis using prognostic research methods and considering SPTB as a time-to-event outcome. This approach is the optimal method of data synthesis and has the potential to overcome the important issues identified with the meta-analyses of aggregate data (inadequate reporting, data loss, statistical methods). Additionally, it avoids the ethical quandary of failing to offer prophylactic treatment to women with a short cervix in the context of a randomised controlled trial. An issue already encountered by the authors of the individual participant data meta-analyses, however, is an inability or unwillingness to share data, which reflects the urgent need for collaboration to improve patient outcomes and minimise research waste. 54

| CONCLUSION
Our review of the literature on transvaginal cervical length ultrasonography to predict SPTB revealed several issues, and we contend that, despite the quantity of research that has been conducted in this area, the question of how well mid-trimester TV cervical length predicts SPTB is yet to be completely answered.
The bulk of published literature comprises narrative reviews with lower methodological rigour. The systematic reviews, nonetheless, carried significant risk of bias and reported on literature that was heterogeneous, with varying thresholds for a number of different variables. Statistical analysis in the primary studies and systematic reviews was performed to assess diagnostic test accuracy; however, cervical length is a prognostic factor that requires a different approach. Our review revealed an overall trend toward recommending transvaginal ultrasound cervical length measurement for asymptomatic women with singleton or twin pregnancy in the second trimester to predict SPTB, but most systematic reviews acknowledged that cervical length has limited ability to effectively identify many women who will go on to deliver prematurely. Likewise, most women with a short cervix will ultimately birth at term. 55 At present, cervical length will most likely continue to be used to guide treatment decisions until it can be replaced by more precise prognostic factors or models. Individual patient data metaanalysis has excellent potential to overcome the limitations in the existing literature, and we recommend this as the next step, using prognostic factor research methodology and analysing continuous variables.

AU T HOR C ON T R I BU T ION S
BWM, ST, SB and RW were involved in conception and supervision of this project, and approval of the article. HF assessed articles for inclusion and risk of bias. RW also assisted with development of the manuscript. KH produced the main concepts of the project, developed and conducted the literature search, acted as the first reviewer and wrote the manuscript.

C ON F L IC T OF I N T E R E S T
BWM is supported by an NHMRC Practitioner Fellowship (GNT1082548) and reports a consultancy for ObsEva, Merck Merck KGaA, and Guerbet. Completed disclosure of interests form available to view online as supporting information.

E T H IC S A PPROVA L
As no participant recruitment or new data collection was involved in this review, ethics approval was not required.

DATA AVA I L A BI L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analysed in this study.