Association and prediction of amniotic fluid measurements for adverse pregnancy outcome: systematic review and meta-analysis


  • RK Morris,

    Corresponding author
    1. Birmingham Centre for Women’s & Children’s Health & School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
    2. Fetal Medicine Centre, Birmingham Women's Hospital NHS Foundation Trust, Birmingham, UK
    • Correspondence: RK Morris, School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK. Email

    Search for more papers by this author
  • CH Meller,

    1. Fetal Medicine Centre, Birmingham Women's Hospital NHS Foundation Trust, Birmingham, UK
    2. Obstetrics Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
    Search for more papers by this author
  • J Tamblyn,

    1. University North Staffordshire NHS Trust Hospital, Stoke on Trent, UK
    Search for more papers by this author
  • GM Malin,

    1. School of Clinical Sciences, the University of Nottingham, Nottingham, UK
    Search for more papers by this author
  • RD Riley,

    1. School of Health and Population Sciences, University of Birmingham, Birmingham, UK
    Search for more papers by this author
  • MD Kilby,

    1. Birmingham Centre for Women’s & Children’s Health & School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
    Search for more papers by this author
  • SC Robson,

    1. Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, UK
    Search for more papers by this author
  • KS Khan

    1. Women's Health Research Unit, The Blizard Institute, Barts and The London School of Medicine, Queen Mary, University of London, London, UK
    Search for more papers by this author



Measurements of amniotic fluid volume are used for pregnancy surveillance despite a lack of evidence for their predictive ability.


To evaluate the association and predictive value of ultrasound measurements of amniotic fluid volume for adverse pregnancy outcome.

Search strategy

Electronic databases (inception to October 2011), reference lists, hand searching of journals, contact with experts.

Selection criteria

Studies comparing measurements of amniotic fluid volume with adverse outcome, excluding pre-labour ruptured membranes or congenital/structural anomalies.

Data collection

Data on study characteristics, design, quality. Random effects meta-analysis to estimate summary odds ratios (prognostic association) and summary sensitivity, specificity and likelihood ratios (predictive ability).

Main results

Forty-three studies (244 493 fetuses) were included demonstrating a strong association between oligohydramnios (varying definitions) and birthweight <10th centile (summary odds ratio [OR] 6.31, 95% confidence interval [95% CI] 4.15–9.58; high-risk population [author definition] n = 6 studies, 28 510 fetuses), and mortality (neonatal death any population summary OR 8.72, 95% CI 2.43–31.26; n = 6 studies, 55 735 fetuses; and perinatal mortality high-risk population summary OR 11.54, 95% CI 4.05–32.9; n = 2 studies, 27 891 fetuses). There was a strong association between polyhydramnios (maximum pool depth >8 cm or amniotic fluid index ≥25 cm) and birthweight >90th centile (OR 11.41, 95% CI 7.09–18.36; n = 1 study, 3960 fetuses). Despite strong associations, predictive accuracy for perinatal outcome was poor.

Author's conclusion

Current evidence suggests that oligohydramnios is strongly associated with being small for gestational age and mortality, and polyhydramnios with birthweight >90th centile. Despite strong associations with poor outcome, they do not accurately predict outcome risk for individuals.


The amniotic fluid is fundamental for proper fetal development and growth, and amniotic fluid volume measurements using prenatal ultrasound have become standard in fetal surveillance, especially in the evaluation of high-risk pregnancies. Alterations in amniotic fluid volume, especially decreased amniotic fluid volume (oligohydramnios), have classically been considered an indicator of adverse perinatal outcome and, therefore, have led to an almost uniform recommendation for delivery following the diagnosis of oligohydramnios, at least for patients at term.[1] However, the number of ultrasonographic modalities applied to assess amniotic fluid volume and the various threshold points reflect the inaccuracies inherent in each of these modalities.[2-7] Moreover, the association between abnormal amniotic fluid volume and adverse perinatal outcomes came from heterogeneous studies that frequently included patients with preterm ruptured membranes or different underlying medical conditions, and/or fetuses with structural anomalies, clinical situations that may affect the amniotic fluid volume.[1]

A previous review of randomised controlled trials (RCTs) has concluded that single deepest vertical pocket measurement is the method of choice for the assessment of amniotic fluid volume[8] on the basis that neither method was superior but that amniotic fluid index (AFI) led to more diagnoses of oligohydramnios, more inductions of labour and more caesarean deliveries for fetal distress without improving perinatal outcome.[8] The observed effect in these RCTs is determined by both test accuracy and the effect of the intervention[9] that follows testing; a conclusion of this review was that a systematic review of the accuracy of AFI versus single deepest pocket was needed.[8]

Therefore we present here a systematic review and meta-analysis of the literature to assess the prognostic association and predictive accuracy of measurements of amniotic fluid for adverse pregnancy outcome and we compare the performance of different techniques of measurement of amniotic fluid.



A systematic review was performed according to a protocol (see Appendix S1) and in accordance with recommended methods.[10-13] Our review has been reported according to the PRISMA guidelines.[14]

The following sources were searched from inception to October 2011: MEDLINE; EMBASE; Cumulative Index To Nursing And Allied Health Literature (CINAHL); The Cochrane Central Register of Systematic Reviews; The Cochrane Central Register of Controlled Trials; DARE; MEDION; SIGLE; Index of Scientific and Technical Proceedings, Web of Science and database. The search consisted of keywords and MeSH terms relating to the tests under investigation combined with MeSH terms of ‘Prenatal Diagnosis’, ‘Ultrasonography’, ‘Amniotic Fluid’ and ‘Pregnancy Outcome’. The full search strategy is shown in the Appendix S2. The reference lists of all included primary and review articles were examined to identify cited articles not captured by electronic searches. Reference Manager 12.0 was used to construct a comprehensive database of literature. No language restrictions were applied.

Study selection

The database was scrutinised by two reviewers (RKM, CHM) and full articles likely to meet the selection criteria were obtained. Translations were obtained for non-English articles. Three reviewers made the final inclusion/exclusion decisions according to adherence to the following criteria.

  1. Population. Pregnant women, with or without fetal growth restriction, no evidence of premature rupture of membranes, no evidence of congenital or structural anomalies.
  2. Index test. Any measure of amniotic fluid reported by the authors including AFI, amniotic fluid volume and maximum deepest pocket. Any threshold used to define low or high amniotic fluid as reported by the authors of the included studies was accepted.
  3. Outcome. Any reference standard looking at compromise of fetal or neonatal wellbeing; including: abnormal cord pH at birth, Apgar scores, perinatal death and composite outcomes such as adverse perinatal outcome. Any reference standard for fetal growth restriction or small for gestational age: Birthweight <10th, <5th, <3rd centile, absolute birthweight thresholds, ponderal index.
  4. Study design. Observational studies in which the results of the test of interest are compared with the outcome findings as confirmed by a reference standard, allowing generation of a 2 × 2 table to compute indices of association and test accuracy for each available threshold. Case series of ten or fewer and case–control studies determined by outcome were excluded.

Data extraction

All articles were assessed independently by a minimum of two reviewers (RKM, CHM, JT, GLM) and the data were abstracted. The following were recorded; study characteristics (authors, journal, year of publication, country, study design, objectives, type of medical centre, and period or duration of the study); characteristics of the participants (study population, method of selection, inclusion and exclusion criteria, whether consecutive cases, number of participants, number of excluded participants and reasons for exclusion, personal and medical characteristics of enrolled women, inpatients compared with outpatients, level of activity, gestation at time of test and at delivery as well as test to delivery interval were recorded); information on how the diagnostic tests were carried out and the results; and methods for assessing the diagnostic accuracy of the tests and the results (number of true and false positives, number of true and false negatives, sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, method of agreement, receiver operating characteristic curve, area under the curve, and the threshold level(s) used). Disagreements were resolved by consensus or through arbitration by a third reviewer (RKM or KSK). For multiple and/or duplicate publication of the same data set only the most recent or most complete study was included. All studies had to state that they excluded rupture of membranes and congenital/structural anomalies due to the association of renal/urinary tract anomalies and karyotypic anomalies with abnormalities of liquor volume.

Study quality assessment

All included manuscripts were assessed by at least one reviewer for study and reporting quality using validated tools for test accuracy studies.[15-19] Methodological quality was defined as the confidence that the study design, conduct and analysis have minimised biases in addressing the research question, thereby focusing on the internal validity (i.e. the degree to which the results of an observation are correct for the patients being studied). Items considered important for a good quality paper were prospective design with consecutive/random recruitment, full verification of the test result with an outcome measure (>90%), adequate description of the population and index test and whether the clinicians managing the patients were blinded to the results of the index test. Quality of the papers was assessed by QUADAS (see Appendix S3). Quality scores were not assigned because these have been shown to give flawed results in a diagnostic accuracy setting. Studies were rated as high quality for subgroup analysis if they satisfied at least four of the following items: adequate description of population; adequate description of the test (measurement of amniotic fluid and threshold) and outcome measure; consecutive recruitment; prospective recruitment; >90% completions of follow up; appropriate outcome measurement; blinding of the investigators performing the outcome measure and a statement regarding the use of intervention between the index test and outcome. As this review was nearing completion QUADAS 2 was published and so all included papers were re-assessed in-line with the recommendations from QUADAS 2.[20] Elements of study design that were likely to have a direct relationship to bias in a test accuracy study were assessed using the STARD checklist.

Data synthesis for prognostic association

From the 2 × 2 tables in each study, odds ratios (OR) were computed with their 95% confidence intervals (CI) for each measure of amniotic fluid (at all reported thresholds) and its outcome pair. Results were pooled using a random effects meta-analysis model[21, 22] where the definition of the measure of amniotic fluid volume, the threshold used and the outcome measure were the same. Odds ratios were selected as the summary statistic, to assess prognostic ability, because they represent the effect of the test on the odds in an unbiased fashion and enable the results of case–control and cohort studies to be included.[23] They are often used to demonstrate an epidemiological association.[23]

A random effects meta-analysis model was chosen for each test due to the expected presence of clinical and statistical heterogeneity between studies. This approach synthesises the log OR estimates and weights each study by the inverse of the study's variance plus between-study variance to produce a summary estimate of the average prognostic effect of a test. As a test's prognostic ability may vary from this average from setting to setting, after each random-effects meta-analysis if I2 > 0% we also estimated a prediction interval to reveal the potential prognostic effect if the test is applied in a single setting similar to one of the studies from our analysis.[21] This was calculated where three or more studies were included in the meta-analysis.[21]

We plotted summary odds ratio data in forest plots and assessed the between-study heterogeneity in prognostic effect of each test by estimating I2 (the amount of variability in prognostic effects due to between-study heterogeneity)[24] and tau-squared.[25] Where possible we performed meta-regression or subgroup analysis as appropriate to examine the effect of potential confounding factors: singleton or multiple birth status, timing of test in relation to delivery, gestation of pregnancy at time of testing, high-risk or low-risk population as assessed by the study authors, and study quality were considered to be important factors that may influence the strength of the association between amniotic fluid measurement and adverse outcome.

In studies where there were cells in the 2 × 2 table with a value of 0, 0.5 was added to all cells to allow the calculation of log odds ratios and their variances for meta-analysis.[26] Meta-analyses were performed where two or more studies reported the same index test and outcome measure.

The primary outcomes were considered to be birthweight <10th centile and birthweight <2500 g for measurements of small for gestational age, and perinatal mortality, abnormal cord pH (<7.20) and adverse perinatal outcome for wellbeing. A composite outcome measure (adverse perinatal outcome) was employed by some included studies to maximise the number of events that could be included in the analysis and avoid the need to select a single morbidity/mortality as a primary outcome measure. However, a hazard of composite outcome measures is the assumption that the significance of the result applies to all components.[27] To address this issue, we analysed the component outcomes as subgroups when these were reported (see Appendix S4). When the composite outcome measure was used, care was taken to ensure that each individual was only counted once in each analysis, particularly where studies reported multiple outcomes for a single population. Where multiple outcomes and test thresholds were reported, attempts were made to select the most consistent threshold and outcome across the analysis. It should be noted that for the outcomes of neonatal death and perinatal mortality, these were the outcomes as used in the included studies and there is no overlap between the studies included in each outcome.

To explore for the presence of publication bias, the Peters test (a weighted linear regression with ln odds ratio as the dependent variable and the inverse of the total sample size as the independent variable) was performed[28] to assess funnel plot asymmetry for each meta-analysis containing ten or more studies, with a significance level of 10% used.[29]

Data synthesis for predictive ability

Odds ratios significantly >1 indicate a prognostic association between amniotic fluid measurement and poor outcome; on average in the population this measure was associated with a worse outcome. We considered odds ratios between 1 and 2 to be mild associations, 2 and 5 to be moderate associations, and >5 as strong associations. In particular, although all odds ratios >1 indicate a prognostic association at the population level, we felt that only an odds ratio >5 would indicate a sufficient discrepancy between amniotic fluid volumes that may have predictive ability at the individual level. Therefore we only considered test accuracy at the individual level (in terms of sensitivity and specificity of a test) when its odds ratio was >5, the 95% CI did not cross 1 and there was statistical significance. We assessed the predictive ability[30] of the test by calculating summary sensitivity, specificity and likelihood ratios, again using data from the 2 × 2 tables and synthesising using a bivariate random-effects meta-analysis model.[31] Likelihood ratios indicate by how much a given test result raises or lowers the odds of having the disease and have been recommended by Evidence-based Medicine Groups[32, 33] as they show how the test result informs clinical decision making.

All analyses were performed in Stata version 11.0 (StataCorp, College Station, TX, USA) using the metan, metandi and metabias commands.[34-36] Summary results were displayed in forest plots generated using StatsDirect.


Figure 1 summarises the process of literature identification and selection. Of the 6259 potential citations, 43 primary articles were included in the critical appraisal and systematic review. Appendix S4 details the individual study characteristics of the included studies and their references. There were 43 studies included overall, reporting on 244 493 fetuses. The commonest index tests reported were amniotic fluid index ≤5 cm (number of studies, n = 23) and maximum pool depth (MPD) ≤2 cm (n = 6; Figure 1). The outcome measures reported most often were birthweight <2500 g (n = 6), birthweight <10th centile (n = 13), Apgar score at 1 minute <7 (n = 12), Apgar score at 5 minute <7 (n = 17), umbilical cord pH <7.20 (n = 5), admission to neonatal intensive care unit (n = 16), perinatal mortality (n = 9), neonatal death (n = 5) and adverse perinatal outcome (n = 9). There were only four papers that reported results for ponderal index and no papers that used a measure of fetal growth restriction as an outcome, e.g. fetal weight <10th centile and abnormal Dopplers.

Figure 1.

Process from initial search to final inclusion for measurements of amniotic fluid volume to predict small for gestational age and compromise of fetal/neonatal wellbeing (up to October 2011). AFI, amniotic fluid index.

Figure 2 shows a summary of the quality assessment of included studies. There was good compliance with appropriate population spectrum, selection criteria adequately described and appropriate reference standard. There was poor compliance with adequate description of index and reference standard (Appendix S3 for adequate criteria). Blinding of the assessors of the outcome measure to the results of the amniotic fluid measurement was also poorly reported (6/43 studies). Only seven studies reported on the use of any treatment in between the amniotic fluid measurement and delivery, or whether the results of the tests were used in determining patient management. When assessing the included papers with the QUADAS-2 recommendations the results were: patient selection, 86% low risk of bias and 9% high concerns re applicability; index test, 65% low risk of bias and low concerns re applicability (mainly due to inadequate description of index test); reference standard, only 9% had low risk of bias (due to non-blinding and poor reporting of method of reference standard) but applicability was high with only 3% of studies having concerns re applicability; for flow and timing 16% of studies had low concerns regarding possibility of bias and this was mainly due to poor reporting of any intervention between index and reference standard.

Figure 2.

Bar chart showing the methodological quality of evidence on amniotic fluid measurements to predict small for gestational age/compromise of fetal wellbeing according to quality criteria.

Summary results for oligohydramnios and outcome of measures of small for gestational age

Five of the 12 meta-analyses performed for oligohydramnios (according to threshold used and outcome definition) provided a summary odds ratio and 95% CI that demonstrated a significant association between oligohydramnios and small for gestational age as measured by birthweight <2500 g or <10th centile (Figure 3). The summary odds ratio estimate was generally above 2, suggesting a reasonably large prognostic association on average. However, there was large heterogeneity for most meta-analyses, even after subgrouping studies (Figure 3), with I2 often over 70%. The heterogeneity is reflected in the wide prediction intervals, which generally include an odds ratio of 1, and reveal that in individual settings the association might vary considerably from the average, and may not even be important in some situations.

Figure 3.

Forest plot of odds ratios for association of measures of oligohydramnios with birthweight outcomes. Open diamonds are sub-groups, squares are individual studies, width of diamonds is confidence intervals, width of horizontal line is estimated prediction interval (EPI). AFI, amniotic fluid index; MPD, maximum pool depth.

For birthweight <10th centile subgroup analysis found only a significant effect from a high-risk population. On average across all measures of oligohydramnios in high-risk populations, the summary odds ratio was 6.31 (95% CI 4.15–9.58) and this was a rare situation where the prediction interval was also entirely above 1 (2.23–17.81; Figure 3). This is compared with a summary odds ratio of 2.34 (95% CI 1.76–3.09) and prediction interval (0.38–14.43) in a low-risk/unselected population. Given this, we also evaluated the summary accuracy of oligohydramnios for correctly predicting a birthweight <10th centile in a high-risk population. Across all measures of oligohydramnios, the summary sensitivity was 0.4 (0.12–0.76), summary specificity was 0.91 (0.66–0.98), summary positive likelihood ratio was 4.23 (2.38–7.52), and summary negative likelihood ratio was 0.66 (0.41–1.06).

There were insufficient papers (n = 1) for meta-analysis of birthweight <3rd and 5th centiles as outcome measures but individual studies showed a stronger association with these more severe measures of SGA (Figure 3). Corresponding results for predictive accuracy were for birthweight <5th centile (MPD < 2 cm) sensitivity 0.43 (95% CI 0.18–0.71), specificity 0.92 (95% CI 0.86–0.96), positive likelihood ratio 5.46 (95% CI 2.38–12.5) and negative likelihood ratio 0.62 (95% CI 0.39–0.98). For birthweight <3rd centile (AFI ≤5 cm), sensitivity 0.04 (95% CI 0.03–0.05), specificity 0.99 (95% CI 0.99–0.99), positive likelihood ratio 12.9 (95% CI 8.88–18.7) and negative likelihood ratio 0.97 (95% CI 0.96–0.98). For birthweight <2500 g (AFI ≤5 cm AFI ≤5 cm, n = 2 studies, 28 554 fetuses) the accuracy results were summary sensitivity 0.09 (95% CI 0.08–0.11), summary specificity 0.98 (95% CI 0.98–0.99), summary positive likelihood ratio 5.04 (95% CI 0.67–38.11) and summary negative likelihood ratio 0.84 (95% CI 0.63–1.11), there was significant heterogeneity.

Summary results for outcome measures of oligohydramnios and fetal wellbeing—primary outcome measures

All analyses demonstrated an association between oligohydramnios and neonatal death, and the majority also indicated an association with perinatal mortality (Figure 4). Heterogeneity was again large and prediction intervals were wide. Across all measures of oligohydramnios there was a strong association with neonatal death (summary OR 8.72, 95% CI 2.43–31.26, estimated prediction interval 0.19–401.44), this was not significantly changed when deaths possibly due to prematurity were excluded. The summary predictive accuracy was a sensitivity of 0.58 (0.19–0.89), specificity of 0.88 (0.55–0.98), positive likelihood ratio of 5.00 (1.69–14.76), negative likelihood ratio of 0.48 (0.20–1.14). There was no difference in any of the subgroup analyses.

Figure 4.

Forest plot of odds ratios for association of measures of oligohydramnios with mortality and adverse perinatal outcome. Open diamonds are sub-groups, squares are individual studies, width of diamonds is confidence intervals, width of horizontal line is estimated prediction interval (EPI). AFI, amniotic fluid index; MPD, maximum pool depth.

For perinatal mortality there was a strong association when restricting to a high-risk population (summary OR 11.54, 95% CI 4.05–32.90) compared with any population (summary OR 3.44, 95% CI 0.61–19.43). Predictive accuracy for a high-risk population was sensitivity 0.29 (0.15–0.46), specificity 0.99 (0.99–0.99), positive likelihood ratio 4.52 (0.95–21.45), negative likelihood ratio 0.49 (0.06–4.06; Figure 4).

For abnormal cord pH there was no strong association with any of the measures of oligohydramnios, and subgroup analysis showed no significant effect in particular subgroups of interest. There was also no difference in the association when looking at more acidotic cord pHs. In all of these analyses for cord pH the estimated prediction intervals crossed the line of no effect. (Figure 4).

For adverse perinatal outcome there was no strong association and no difference with subgroup analysis.

Summary results for outcome measures of oligohydramnios and fetal wellbeing—other outcome measures

Most meta-analyses gave a summary odds ratio that suggested a moderate association between oligohydramnios and fetal/neonatal morbidity (Table 1) with the odds ratio typically between 2 and 4. As above, heterogeneity was typically large, even when subgroup analyses were considered, and this led to wide prediction intervals.

Table 1. Results for other measures of adverse perinatal outcome
Outcome measureNo. of included studiesNo. of fetusesOdds ratio (95% CI)Tau I 2 EPISensitivity (95% CI)Specificity (95% CI)LR +ve (95% CI)LR −ve (95% CI)
  1. EPI, estimated prediction interval; LR, likelihood ratio; NICU, neonatal intensive care unit.

Resuscitation (AFI ≤5 cm, high-risk population) 156512.02 (3.82–37.89)   0.31 (0.11–0.59)0.96 (0.94–0.98)8.58 (3.69–19.96)0.71 (0.51–0.99)
Admission to NICU 1543 2222.05 (1.21–3.45)0.886.10.29, 14.54    
AFI ≤5 cm1238 2021.64 (0.76–3.53)1.489.10.1, 26.22    
Fetal distress 1212 7942.69 (1.27–5.70)1.485.80.17, 42.3    
AFI ≤5 cm939 8391.86 (0.95–3.67)0.779.50.21, 16.69    
MPD <2 cm2241.15 (0.22–6.12)0.629.4     
MPD <1 cm1307.93 (3.35–18.77)   0.67 (0.47–0.83)0.80 (0.72–0.86)3.31 (2.19–5.0)0.42 (0.25–0.70)
APGAR 1 minute <7 835 6942.94 (1.1–7.91)1.791.30.09, 93.17    
AFI ≤5 cm534 8283.33 (0.9–12.27)293.60.02, 471.07    
MPD <2 cm1560.67 (0.12–3.65)       
APGAR at 5 minute <7 1947 4312.61 (1.32–5.17), 31.23    
Test within 24 hours245 0909.77 (4.93–19.37)0.110.7 0.06 (0.04–0.08)0.99 (0.99–0.99)6 (1.47–24.57)0.95 (0.93–0.97)
AFI ≤5 cm1145 5422.89 (1.12–7.49)1.787.90.12, 70.5    
MPD <2 cm1560.8 (0.03–20.62)       
Morbidity 611 4001.81 (1.0–3.3)000.78, 4.23    
AFI ≤5 cm269191.66 (0.36–7.68)00     
Preterm delivery <37 weeks 434 5081.87 (0.33–10.67)2.993.40, 7019.56    

For assisted ventilation/intubation there was an especially strong association but this was from a single study in a term population (Table 1). For fetal distress the association was generally around an odds ratio of 2 with no significant effects demonstrated in the subgroup analysis. When looking at individual measures of oligohydramnios (amniotic fluid index ≤5 cm, or maximum pool depth <2 cm, <1 cm) only maximum pool depth <1 cm (single study) showed a significant association (Table 1). For Apgar score at 5 minute <7 there was a significant association when looking at tests performed within 24 hour of delivery.

Summary results of predictive accuracy of amniotic fluid index versus maximum pool depth

Where a direct comparison was possible between AFI and MPD for an individual outcome within the same study then predictive accuracy results were calculated to directly compare the different measures (Table 2). There was no difference for the outcomes of Apgar scores, admission to neonatal intensive care unit, fetal distress, neonatal death or perinatal mortality. For adverse perinatal outcome and birthweight <10th centile there were improved positive likelihood ratios for MPD versus AFI with no significant change in specificity or negative likelihood ratio.

Table 2. Direct comparison of amniotic fluid index and maximum pool depth within individual studies
AuthorIndex testOutcome and testSensitivity95% CISpecificity95% CILR+ve95% CILR−ve95% CI
  1. LR, likelihood ratio; NICU, neonatal intensive care unit.

Chauhan et al.46AFI ≤5 cmBirthweight <10th centile0.140.04–0.330.970.92–0.994.21.20–14.680.890.76–1.04
MPD ≤2 cm0.040.00–0.1810.98–1.015.310.64–366.610.950.87–1.04
Youssef et al.47AFI ≤5 cm0.790.62–0.910.690.61–0.772.591.91–3.500.30.15–0.58
MPD ≤1 cm0.560.38–0.730.790.71–0.852.611.69–4.030.560.38–0.83
Desari et al.48AFI ≤5 cmAdmission to NICU0.220.03–0.600.650.54–0.750.630.18––1.76
MPD ≤3 cm0.440.14–0.790.40.30–0.500.740.35–1.561.40.74–2.66
Morris et al.49AFI ≤5 cm0.120.05–0.210.920.91–0.941.50.79–2.840.960.88–1.04
MPD ≤2 cm0.010–0.070.990.98–0.990.920.13–6.7510.98–1.03
Myles et al.50AFI ≤5 cm00–0.60.870.82–0.910.740.05–10.461.040.77–1.40
MPD ≤2.5 cm00–0.60.860.81–0.900.680.05–9.631.050.78–1.42
Morris et al.49AFI ≤5 cmFetal distress0.290.04–0.710.920.91–0.943.661.12–11.960.780.49–1.24
MPD ≤2 cm00–0.410.990.98–0.994.380.29–66.210.950.80–1.14
Myles et al.50AFI ≤5 cm0.20.10–0.320.890.84–0.931.720.90–3.290.910.79–1.04
MPD ≤2.5 cm0.180.09–0.300.870.81–0.911.340.69–2.590.950.83–1.08
Youssef et al.47AFI ≤5 cm0.870.69–0.960.690.61–0.772.842.14–3.780.190.08–0.48
MPD ≤1 cm0.670.47–0.830.80.72–0.863.312.19–5.000.420.25–0.70
Youssef et al.47AFI ≤5 cmApgar at 5 minutes <70.890.65–0.990.650.57–0.732.571.96–3.370.170.05–0.63
MPD ≤1 cm0.720.47–0.900.770.70–0.833.132.09–4.690.360.17–0.76
Fischer et al.51AFI ≤5 cmAdverse perinatal outcome0.290.13–0.510.890.84–0.932.671.26–5.690.80.61–1.03
MPD ≤2 cm0.250.10–0.470.960.92–0.986.212.28–16.950.780.62–0.99
MPD ≤1 cm0.130.03–0.3210.98–1.00492.61–920.790.860.74–1.01
Desari et al.48AFI ≤5 cmNeonatal death0.330.01–0.910.660.56–0.750.980.19–4.971.010.45–2.28
MPD ≤3 cm0.330.01–0.910.40.30–0.510.560.11–2.791.670.72–3.83
Youssef et al.47AFI ≤5 cmPerinatal mortality0.880.47–0.990.620.54–0.702.311.66––1.27
MPD ≤1 cm0.750.35–0.970.740.67–0.812.91.80–4.660.340.10–1.12

Summary results for outcome measures of polyhydramnios and small for gestational age and wellbeing

There were five papers, including 144 681 fetuses, that reported on polyhydramnios and adverse outcomes (Table 3). Thresholds included AFI ≥24 cm (three papers), AFI ≥25 cm (one paper) and in one paper there was no threshold reported.

Table 3. Results for association and predictive value of polyhydramnios for adverse pregnancy outcome
Outcome measure, any measure of polyhydramniosNo. of studiesNo. of fetusesOdds ratio95% CITau I 2 EPISensitivity95% CISpecificity95% CILR+ve95% CILR−ve95% CI
  1. EPI, estimated prediction interval; LR, likelihood ratio; NICU, neonatal intensive care unit.

Fetal growth
Birthweight <10th centile257020.370.07–1.951.3290.9         
Birthweight <2500 g245 7952.380.42–13.581.1968.9         
Birthweight >90th centile1396011.417.09–18.36 0.260.18–0.360.970.96–0.988.716.01–12.620.760.68–0.86
Fetal/neonatal wellbeing
Apgar <7 at 5 minutes3141 9013.971.58–9.990.6395.40–49 5049.69        
Apgar <7 at 1 minute248 7173.670.91–14.870.9997         
Admission to NICU144 7574.893.61–6.63         
Need for caesarean section139601.571.09–2.24         
Fetal distress139601.50.87–2.60         
Meconium-stained liquor139601.170.56–2.42         
Neonatal death248 7175.070.69–37.31.9795.1         
Perinatal mortality297 1443.21.97–         
Intrauterine death248 7174.131.36–12.480.5383.9         

There was no evidence of an association between polyhydramnios and birthweight <10th centile or <2500 g, Apgar score at 1 minute <7, fetal distress or neonatal death. There was a strong positive association with polyhydramnios and birthweight >90th centile and this corresponded to low sensitivity with high specificity. There was significant heterogeneity throughout.

Peters test was performed on all meta-analyses where there were ten or more studies included (admission to neonatal intensive care unit, fetal distress, 5-minute Apgar <7, abnormal pH and birthweight <10th centile. There was no significant evidence of small study effects (P values 0.21, 0.19, 0.61, 0.35, 0.62, respectively).


This is the first study to look at all measurements of amniotic fluid and compare their prognostic association and, where appropriate, their ability to predict adverse outcomes for individuals. The results demonstrate that there is an especially strong association between oligohydramnios and small for gestational age and mortality. Polyhydramnios was associated with birthweight >90th centile. For the measures that gave large summary odds ratios >5, the summary positive and negative likelihood ratios indicate that oligohydramnios and polyhydramnios substantially change the odds of an adverse outcome. However, the low summary sensitivity and negative likelihood ratios being close to 1, reveal that a negative test result is not good at discriminating accurately between those who will and those who will not have the adverse outcome. Hence, to accurately predict risk of adverse outcome, oligohydramnios and polyhydramnios should be used in conjunction with other prognostic factors as part of a prognostic model.[37]

The inferences for clinical practice that can be made from the results of this study are limited by the biases introduced from the designs of the included studies and in particular the treatment/intervention paradox, this is discussed further below.

Strengths and limitations of the review

This review provides the most up-to-date summary and meta-analysis of the association and predictive ability of abnormal liquor volume with small for gestational age and adverse fetal and neonatal wellbeing. The strengths of our review are in the methodology used complying with existing guidelines for systematic reviews of diagnostic studies and contemporary methods for meta-analysis.[10-13, 31] Our search was extensive across many databases with no language restrictions. We have rigorously assessed study quality and reporting quality looking at risk of bias and applicability and assessed for publication bias. Heterogeneity has been explored using meta-regression and subgroup analysis. A further strength to our review is the exclusion of patients with ruptured membranes and structural or chromosomal anomalies.

The limitations to our review lie in the limitations from the quality of the primary research. Our quality assessment revealed concerns regarding possibility of bias through patient selection, performance of the index test and reference standard. We were unable to perform subgroup analysis for preterm versus term pregnancies and some studies reported insufficient data to determine whether thresholds for amniotic fluid measurement were adjusted for gestation. Where possible we used the results obtained closest to delivery and have performed subgroup analysis for those where the test was performed within 7 days of delivery. In particular, there was very poor reporting regarding the exact methods of the reference standards and whether there was any treatment used between the performance of the index and reference standard. A major concern therefore is in how many pregnancies was induction of labour performed due to the finding of oligohydramnios, which influences the results for pregnancy outcome, i.e. intervention bias. This bias can only truly be removed by performing an RCT, this would be impossible to perform as measurements of amniotic fluid volume have become the standard in fetal surveillance and management of high-risk pregnancies and so recruitment to such a trial would be very difficult. Finally, the outcome measures used in this review were those that were reported by the authors of the included studies, it is recognised that many of the outcome measures are subjective (e.g. admission to neonatal intensive care unit, need for resuscitation). The only real objective measure of poor fetal outcome is paired samples of cord pH and longer-term outcomes such as cerebral palsy, which were not reported.


Comparison with other studies

This study looks at the strength of association of measures of amniotic fluid with adverse outcomes and where appropriate, their predictive accuracy, and to our knowledge this is the first systematic review and meta-analysis to do this. However, for a test to be recommended in clinical practice it must be reliable, accurately reflect the condition it is diagnosing and usefully predict adverse outcome such that when used to determine management it ultimately improves pregnancy outcome. The reliability of measures of amniotic fluid has been assessed in previous studies. These have concluded that reproducibility can be affected by fetal position, transducer pressure, maternal hydration and use of colour Doppler due to the observer variation or variations in the fluid volume.[38-41] To determine which measure (AFI versus MPD) more accurately reflects true oligohydramnios requires comparison with dye dilution techniques or comparison with volumes assessed at caesarean section. This has been performed by Magann et al. in 2000 and the authors concluded that both techniques were unreliable in determining true amniotic volume.[42] Two previous systematic reviews and meta-analyses have looked at the effect of measurements of amniotic fluid on pregnancy outcome, the first by Magann et al.[43] included non-randomised and randomised controlled trials, the second by Nabhan et al.[8] included only RCTs. Both of these studies concluded that there was no evidence that either method (AFI or MPD) was superior to the other in preventing adverse pregnancy outcome and noted that AFI characterised more women as having oligohydramnios leading to an increase in obstetric interventions without any improvement in pregnancy outcome.[8, 43]


Oligohydramnios is associated with small for gestational age and mortality. Polyhydramnios is associated with birthweight >90th centile. The strong associations mean oligohydramnios and polyhydramnios modify the odds of an adverse outcome if test positive. However, to improve the accuracy of predicting future outcome risk for individuals, oligohydramnios and polyhydramnios need to be combined with other prognostic factors within a prognostic model.

Implications for clinical practice

Despite some strong associations demonstrated with oligohydramnios and birthweight <10th centile and mortality, the predictive ability for individuals was poor with generally good specificity and positive likelihood ratios but low sensitivity (<0.5) and negative likelihood ratios near 1. This can be interpreted as an increased risk (odds) of adverse outcome for those that test positive (compared with pretest risk) but for those that test negative there is minimal change in the risk of an adverse outcome. There was no significant difference in association or predictive accuracy comparing AFI to MPD apart from improved positive likelihood ratios (not significantly) for maximum pool depth for adverse perinatal outcome and birthweight <10th centile.

Although not accurate for individual prediction, the evidence indicates that oligohydramnios is a prognostic factor for birthweight <10th centile and mortality. As such, it has many potential uses.[44] For example, informing randomisation strategies in clinical trials; as a confounder to adjust for in observational studies and unbalanced trials; and combined with other prognostic factors to allow more accurate predictions for individuals. However, due to the limitations discussed it would seem prudent to limit its use to high-risk pregnancies in whom intervention (such as early delivery) would be considered.

Implications for future research

Future research needs to investigate further the test accuracy of measures of amniotic fluid volume using appropriately designed test accuracy studies, with suitable sample size calculations but also considering the value of the test within the diagnostic and management pathway and what can be done to improve the test's diagnostic and therapeutic yield.[45] For example, the use of amniotic fluid measures within the biophysical profile assessment and in combination with umbilical artery Doppler needs to be assessed in the same rigorous manner. It is important that any future research addresses the issue of the different types of measurements, the varying thresholds used and the unexplained heterogeneity identified within this review. The present evidence demonstrates that there is no significant improvement with accuracy for MPD versus AFI and effectiveness evidence has also supported the use of MDP. As this is a much easier technique to perform, this should become recommended practice until more robust evidence becomes available.

Disclosure of interests

We declare no conflicts of interest.

Contribution to authorship

All authors were responsible for the design of the study. RKM, CHM, JT and GLM were responsible for the data extraction and RKM, GLM, RR, CHM, JT, MDK, SCR and KSK for the analysis. All authors checked the analysis and were involved in the drafting and critical revision of the manuscript and accept responsibility for the manuscript as published.

Details of ethics approval

As this was a systematic review of published data, ethical approval was not required.


Dr R K Morris is funded by an NIHR Clinical Lectureship. Dr Richard Riley is supported by funding from the MRC Midlands Hub for Trials Methodology Research, at the University of Birmingham (Medical Research Council Grant ID G0800808).


Dr Pradeep Jayaram who helped with some of the data extraction.