Accuracy of ultrasound biometry in the prediction of macrosomia: a systematic quantitative review


Dr A. Coomarasamy, Education Resource Centre, Birmingham Women's Hospital, Metchley Park Road, Birmingham B15 2TG, UK.


Objective  To determine the accuracy of ultrasonographically estimated fetal weight (EFW) and abdominal circumference (AC) in the prediction of macrosomia.

Design  Systematic quantitative review.

Methods  Studies were identified without language restrictions from MEDLINE (1966–2003), EMBASE (1980–2003), Cochrane Library (2003:4), SCISEARCH (1974–2003) and manual searching of bibliographies of known primary and review articles. Studies were selected if accuracy of ultrasonographically EFW or AC was evaluated for predicting macrosomia using birthweight as the reference standard. Data were extracted on study characteristics, quality and accuracy. Data were pooled to produce summary receiver operating characteristic curves (sROC) for studies with various test thresholds. Summary likelihood ratios for positive (LR+) and negative (LR−) test results were generated for an EFW of 4000 g and an AC of 36 cm for predicting birthweight of over 4000 g.

Main outcome measures  Birthweight over various thresholds.

Results  There were 36 primary articles consisting of 63 accuracy studies (51 evaluating the accuracy of EFW, and 12 accuracy of fetal AC), including a total of 19,117 women. The sROC area for EFW was not different from the area for fetal AC (0.87 vs 0.85, P= 0.91). For predicting a birthweight of over 4000 g, the summary LRs were 5.7 (95% CI: 4.3 to 7.6) for a positive test and 0.48 (95% CI: 0.38 to 0.60) for a negative test, using Hadlock's method of ultrasonographically estimating fetal weight. For ultrasound fetal AC of 36 cm, the respective LRs for predicting a birthweight over 4000 g were 6.9 (95% CI: 5.2 to 9.0) and 0.37 (0.30–0.45).

Conclusion  There is no difference in accuracy between ultrasonographically EFW and AC in the prediction of a macrosomic baby at birth. A positive test result is more accurate for ruling in macrosomia than a negative test result for ruling it out.


The term ‘macrosomia’ generally describes a fetus or newborn with an estimated or actual birthweight greater than 4000 g,1 although numerous other definitions exist based on other absolute weight or centile thresholds.2,3 The macrosomic pregnancy is at increased risk of shoulder dystocia, brachial plexus injury, asphyxia, prolonged labour, operative delivery and postpartum haemorrhage.2,4 However, there is no clear consensus regarding the antenatal prediction and management of a macrosomic fetus,5 although ultrasound tests are commonly requested by women and clinicians when a large baby is suspected, and the findings influence obstetric management.6

The two main ultrasonic methods used for predicting a macrosomic fetus are based on measurement of fetal abdominal circumference (AC), and estimated fetal weight (EFW).7 It is currently unclear which one of these two ultrasound methods has better diagnostic accuracy in predicting macrosomia. Although there are several primary diagnostic studies, these studies have not generally been conducted with large enough sample size to provide precise accuracy estimates. With one exception,5 existing reviews are non-systematic,2,7–13 their study selection has often been limited5 and they have all ignored study quality assessment. These factors introduce a substantial potential for bias.14 Against this background, we conducted a comprehensive and rigorous systematic review to obtain precise estimates of diagnostic accuracy of ultrasound EFW and AC to predict macrosomia.


The review was carried out with a prospective protocol using widely recommended methods.15–18

We searched MEDLINE (1996–2003), EMBASE (1980–2003), Cochrane Library (2003:4), SCISEARCH (1974–2003) and Conference Proceedings (ISI Proceedings, 1990–2003) for relevant citations. In MEDLINE, a combination of Medical Subject Headings (MeSH) and textwords were used to generate two subsets of citations, one including studies of macrosomia (‘macrosomia’, ‘birth weight’, ‘large for gestational age’, ‘large for dates’ and ‘large fetus’) and the other studies of ultrasonography (‘ultrasonography’, ‘ultrasound’, ‘sonography’ and ‘sonogram’). These subsets were combined using ‘AND’ to generate a subset of citations relevant to the research question. Where necessary, this search strategy was adapted for use in the other electronic databases. The reference lists of all known primary and review articles were examined to identify cited articles not captured by electronic searches. Articles frequently cited were used in the Science Citation Index to identify additional citations. We also made enquiries about unpublished studies from researchers investigating in this field.

Studies in which accuracy of EFW or AC was evaluated for predicting macrosomia using actual birthweight as the reference standard were selected in a two-stage process. First, the electronic searches were scrutinised and full manuscripts of all citations that were likely to meet the predefined selection criteria were obtained. Second, final inclusion or exclusion decisions were made on examination of these manuscripts. In cases of duplicate publication, the most recent and complete versions were selected. There were no language restrictions, but studies with case–control design were excluded.14

Information was extracted from each selected article on study characteristics, quality and accuracy results. Accuracy data were used to construct 2 × 2 tables of ultrasound result (test was positive if EFW or AC was above the threshold as defined in the primary study, and test was negative if these were below the threshold) and actual birthweight (macrosomic if birthweight was over the threshold as defined in the primary study, and non-macrosomic if birthweight was below this threshold).

All manuscripts meeting the selection criteria were assessed for methodological quality. Quality was defined as the confidence that the study design, conduct and analysis minimised bias in the estimation of test accuracy. Based on existing checklists,14,15,17–19 quality assessment involved scrutinising study design and relevant features of the population, test and outcomes of the study. A study was considered to be of good quality if it used a prospective design, consecutive enrolment, full verification of the test result with reference standard, and had adequate test description.14,17,19 Blinding was not considered to be an important quality issue as the reference standard (birthweight) would only be available after birth, and the reference standard itself was an objective measurement, and was therefore unlikely to be biased by the knowledge of ultrasound predicted weight or AC measurement.

Summary receiver operating characteristic (sROC) curves for various test thresholds20 and summary likelihood ratios (LRs)21 for single test thresholds were used as measures of accuracy. Area under sROC curve provided an average measure of accuracy22 from the combined studies, especially when there were different test thresholds, and a convenient way of comparing the overall accuracy of EFW versus AC.23 However, sROC has limited value in clinical interpretation of diagnostic information, with values more than 0.5 indicating a test that is more predictive than chance, and 1 indicating perfect accuracy with 100% sensitivity and specificity. We therefore used summary LRs, which indicate by how much a given ultrasound test result raises or lowers the probability24,25 of having a macrosomic newborn. This information is more likely to be clinically relevant than sROC area.18

Data were synthesised separately for EFW and AC. The analysis was stratified according to technique or formulae used for the test, the test threshold (4000 g, 4500 g or >90th centile for EFW, and >36 cm or >90th centile for AC) and the reference standard threshold (birthweight >4000 g, 4500 g or >90th centile). Heterogeneity of diagnostic odds ratio (DOR) was assessed graphically using forest plot26 (not shown) and statistically using χ2 test27 to aid in decisions on how to proceed with quantitative synthesis.28 As for some tests and outcomes, there was either graphical or statistically significant heterogeneity, we used random effects model meta-analysis.27 Possible sources of heterogeneity were explored by meta-regression analysis29 using various independent explanatory variables defined a priori. These variables were: inclusion criteria (diabetes, post-dates or other), gestation (>36 weeks or other), scan-to-delivery interval (<1 week, or other), type of recruitment (consecutive or other), study design (prospective or retrospective) and test description (adequate or inadequate).

To detect publication and related biases, we undertook a funnel plot analysis of logDOR versus 1/SE, using Begg's and Egger's tests to evaluate for asymmetry.30 All statistical analyses were performed using SPSS v. 10 and Stata 7.0 statistical packages.


Figure 1 summarises the process of literature identification and selection. There were 36 primary articles consisting of 63 accuracy studies (51 evaluating the accuracy of EFW and 12 accuracy of fetal AC), including a total of 19,117 women. A table summarising each study's salient features, methodological quality and accuracy data can be obtained from the authors. The quality of the 63 diagnostic studies is summarised in stacked bar charts in Fig. 2.

Figure 1.

Study selection process for systematic review of ultrasound biometry to predict macrosomia.

Figure 2.

Quality of the included studies.

The pooled accuracy estimates for EFW using different formulae, and stratified by different ultrasound and birthweight thresholds are presented inTable 1 as likelihood ratios (LR+ and LR−). The most commonly used formulae for estimating fetal weight were Hadlock's formula,31,32 using femur length (FL) and AC, and Shepard's formula33 using biparietal diameter (BPD) and AC measurements(Table 1). The pooled LR+ for ultrasound EFW over 4000 g to predict an actual birthweight of over 4000 g was 5.7 (4.3–7.6) by Hadlock's (FL/AC) formula,31 with a respective LR− of 0.48 (0.38–0.60). Meta-regression analysis showed that test accuracy estimation was not influenced by risk category, gestation at test, scan-to-delivery interval, study quality or adequacy of test description.

Table 1.  LRs for predicting macrosomia using ultrasound EFW. There was no statistically significant heterogeneity in any of the meta-analyses presented in this table.
Formula for calculating EFWTest (EFW) thresholdReference standard (birthweight) thresholdNo. of studies*Total no. of participants in the included studiesPooled LR+ (95% CI)#Pooled LR− (95% CI)#
  • *

    Studies with unusual ultrasound or birthweight thresholds (e.g. 4100 g) have not been presented in this table.

  • #

    The LR indicates by how much a given test result raises or lowers the probability of having the disease. The higher the LR of an abnormal test, the greater the value of the test. Conversely, the lower the LR of a normal test, the greater the value of the test. An LR of >10 or <0.1 is regarded as ‘very useful’ test accuracy, whereas a LR of 5–10 or 0.1–0.2 is regarded as ‘moderately useful’, and a LR of 2–5 or 0.5–0.2 is regarded as ‘somewhat useful’. A LR of 1–2 or 0.5–1 is only regarded as ‘little useful’ and LR of 1 as ‘useless’. Although this categorisation is useful for interpretation of LRs, it should be noted that the value of a test may vary depending on the pre-test probability of the condition, and the consequences of treatment.

Hadlock's formula using AC and FL31,324000 g4000 g612895.7 (4.3–7.6)0.48 (0.38–0.60)
4500 g4500 g1121333 (12–90)0.57 (0.25–1.30)
Hadlock's formula using HC and AC324000 g4000 g22354.6 (1–29)0.14 (0.01–1)
Hadlock's formula using FL and AD343800 g3800 g12216.5 (3.4–12)0.27 (0.13–0.56)
Hadlock's formula using BPD, AC and FL314000 g4000 g410408.5 (5.9–12)0.42 (0.30–0.58)
4500 g4500 g166129 (11–72)0.43 (0.19–0.99)
Hadlock's formula using AC, FL and HC314000 g4000 g264729 (14–16)0.51 (0.31–0.85)
>90%>90%14065.1 (3.0–8.7)0.55 (0.36–0.86)
Hadlock's formula using AC, FL HC and BPD31,32,354000 g4000 g114913 (4–44)0.57 (0.29–1)
>90%>90%21479.3 (3.7–24)0.37 (0.14–0.93)
Shepard's formula using BPD and AC334000 g4000 g48398.3 (5.2–13)0.55 (0.33–0.91)
>90th centile>90th centile614727.0 (2.7–18)0.43 (0.27–0.68)
>1.5 standard deviation>90%159519 (10–35)0.28 (0.15–0.51)
Birnholz's formula using BPD and AD364000 g4000 g110316.5 (5.0–8.5)0.39 (0.31–0.49)
Chauhan's formula using AC and FL374000 g4000 g16025.2 (3.3–8.3)0.42 (0.26–0.68)
Combs' formula using AC, FL and HC384000 g4000 g11499.5 (3.6–25)0.48 (0.24–0.96)
Hansmann's formula using BPD and TTD394000 g4000 g11502.6 (1–12)0.93 (0.51–1)
Hsieh's formula using BPD and AC404000 g4000 g1105432 (15–69)0.51 (0.28–0.93)
Rossavik's formula using BPD, OFD, ALD and FL414000 g4000 g149822 (10–46)0.34 (0.18–0.67)
>90%>90%1557.6 (1.9–29)0.37 (0.14–0.98)
Tamura's formula using AC and BPD424000 g4000 g1703.6 (1.1–12)0.58 (0.20–1)
Vintzileos' formula using AC and BPD434000 g4000 g11493.0 (1.5–6.0)0.40 (0.18–0.90)
Warsof's formula using BPD and AC444000 g4000 g1709.7 (0.81–115)0.85 (0.34–1)
>90%>90%15513 (1.5–111)0.64 (0.27–1)

Twelve studies evaluated the accuracy of AC to predict birthweight over 4000 g (n= 5), 4500 g (n= 1) or >90th centile for gestation (n= 6) (Table 2). The pooled LRs for positive and negative tests of an AC over 36 cm to predict a birthweight over 4000 g were 6.9 (5.2–9.0) and 0.37 (0.30–0.45), respectively. The corresponding LRs for an AC over the 90th centile to predict a birthweight over the 90th centile for gestation were 4.2 (2.3–7.7) and 0.33 (0.21–0.54). Again, meta-regression analysis showed that test accuracy estimation was not influenced by risk category, gestation at test, scan-to-delivery interval, study quality or adequacy of test description.

Table 2.  LRs for predicting macrosomia using ultrasound measured fetal AC. There was no statistically significant heterogeneity in any of the meta-analyses presented in this table.
Test (AC) thresholdReference standard (birthweight) thresholdNo. of studiesTotal no. of participants in the included studiesPooled LR+ (95% CI)Pooled LR− (95% CI)
36 cm4000 g434756.9 (5.2–9.0)0.37 (0.30–0.45)
36 cm4500 g112138.4 (4.1–17)0.03 (0.01–0.57)
37 cm4000 g1343.1 (1–16)0.31 (0.08–1)
>90%>90%518644.2 (2.3–7.7)0.33 (0.21–0.54)
>2 Standard deviation>90%1794.9 (1.9–13)0.34 (0.15–0.80)

We pooled the accuracy data in sROC curves from studies in which the reference standard threshold was 4000 g or >90th centile, separately for EFW and AC. The reason for this approach was that the ROC space allows the representation of different test thresholds, but not for different reference standard thresholds; when reference standard thresholds vary substantially, separate ROC curves are a more meaningful way of summating the diagnostic evidence. There were 19 studies in the EFW group and 11 in the AC group that fulfilled above reference standard thresholds, and were thus pooled in the sROC space. The sROC area for EFW was 0.87, while it was 0.85 for fetal AC (Fig. 3). The areas under the curve (AUCs) were not significantly different (P= 0.91), indicating similar overall accuracies for EFW and AC.

Figure 3.

Summary ROC curve for studies of EFW and AC to predict a birthweight over 4000 g or 90th centile (the grey lines are upper and lower 95% confidence intervals).

Funnel plot analysis did not show any evidence of asymmetry, indicating publication bias and related biases are less likely.


There was a wide variation in studies in the formulae for calculating EFW or AC, threshold for defining abnormality of the ultrasound test, the interval between performing test (ultrasound) and the reference standard (birthweight), and threshold for reference standard. The diagnostic accuracy varied substantially depending on the method of measurements and thresholds. However, the accuracy of EFW was found to be equivalent to that of AC in predicting birthweight over 4000 g. Moreover, generally, a positive test result was found to be more accurate for ruling in macrosomia than a negative test result for ruling it out.

The strength of the above conclusions depends on the rigor of the systematic review methodology and the quality of the included primary studies.14,15 The literature search was comprehensive across various databases and no language restrictions were applied, thus selecting a large number of appropriate studies for inclusion in the review. The quality of the included studies was generally good as methodological deficiencies such as case–control design14 (excluded from the review), absence of description of tests and verification bias did not apply to the vast majority of studies selected for this review. However, a weakness in this systematic review was the presence of extensive clinical heterogeneity (different methods of estimating EFW and AC, and different ultrasound or reference standard thresholds) which meant that each meta-analysis had a small number of studies with a resultant loss of precision (wide confidence interval).

Given the lack of precision, it was not possible to recommend one method of measuring EFW or AC over another. Notwithstanding this limitation, this review represents the best synthesis of currently available evidence, and allows researchers to begin to identify the accuracy and the limitations of various methods of measuring macrosomia, as well as plan further primary diagnostic studies. The finding will also guide those planning trials of interventions to reduce neonatal and maternal morbidity in mothers with macrosomic babies. However, the poor accuracy of ultrasound suggests that relying on ultrasound findings to decide on clinical management options can lead to potential harm if there is complacency with ‘normal’ ultrasound estimates of fetal weight or an increase in interventions unnecessarily from induction or elective caesarean section when macrosomia is incorrectly ruled in by ultrasound scanning. Because normal and abnormal ultrasounds are likely to be inaccurate in many cases, over-reliance on them to guide practice should be avoided.

Conflict of interest