- Top of page
Fetal macrosomia is associated with adverse maternal and fetal outcomes1, and failure to correctly identify fetal macrosomia may further increase the risk for adverse perinatal outcome2, 3. However, the optimal sonographic fetal weight-estimation model for the prediction of fetal macrosomia remains controversial. Specifically, while some found that abdominal circumference (AC) as a single measure is highly accurate in the prediction of fetal macrosomia4, 5, others reported that models based on multiple fetal biometric indices provided better results6. This discrepancy may be related, at least in part, to methodological limitations of some of the studies, such as small sample size5, 6, comparison of only a small number of different models and the inclusion of cases in which the sonographic examination was performed up to 1 week prior to delivery5–7, a period during which significant fetal weight gain may occur.
Another important limitation is that a fixed threshold (e.g. estimated weight > 4000 g) has been used to compare the accuracy of different sonographic fetal weight-estimation models for the detection of macrosomia by almost all studies conducted to date8–10. This type of threshold, however, may not be the optimal threshold for the prediction of macrosomia given that most published models were developed by regression analyses that provided the best overall fit with actual birth weight throughout the range of birth weights11–16, even though it may well be that the fit between estimated fetal weight and actual birth weight in a given subrange of fetal weights (e.g. > 4000 g) is suboptimal17. Therefore, it is possible that the optimal threshold for the identification of macrosomia may not be an estimated weight of > 4000 g, and that the use of another, optimized model-specific threshold, may be more accurate for this purpose7. In addition, comparing models using these optimized, model-specific thresholds may provide more reliable information regarding the relative accuracy of the different models for the detection of fetal macrosomia.
The aim of the present study was to compare the accuracy of 21 sonographic fetal weight-estimation models and AC as a single measure for the prediction of fetal macrosomia using either fixed or optimal model-specific thresholds in a large, unselected cohort of women who underwent sonographic examination within 3 days prior to delivery.
- Top of page
This study sought to compare the accuracy of different sonographic fetal weight-estimation models for the prediction of fetal macrosomia using either fixed or optimal model-specific thresholds. Our main findings were as follows. First, the use of a fixed threshold of estimated weight > 4000 g is associated with considerable variation among the different models in the predictive accuracy for macrosomia. Second, the AUC, which is a threshold-independent measure of accuracy, is relatively similar for all models. Third, the use of the model-specific threshold derived from the inflexion point of the ROC curve of each individual model decreases the intermodel variation to a minimum. Finally, even when this optimal model-specific threshold is applied, models that are based on three or four biometric indices appear to perform better than do models that are based on only two indices or on AC as a single measure.
At present, the optimal sonographic model for the prediction of fetal macrosomia is unclear, and discrepant findings have been reported regarding the accuracy of AC as a single measure compared with models that incorporate multiple biometric indices4–7, 27. In a recent systematic review4, AC as a single measure was found to have a similar area under the summary ROC curve and a higher pooled LR+ compared with sonographic fetal weight-estimation models (0.85 vs. 0.87, P = 0.9; and 6.9 vs. 5.7, respectively). However, because the summary ROC curve calculated for the sonographic fetal weight-estimation models incorporated the results of multiple different sonographic models, the authors were actually comparing the performance of AC with an ‘average’ of the different sonographic models rather than with the best-performing sonographic models. Furthermore, the comparison of the pooled LR+ between AC and the different sonographic models was limited by the fact that the individual LR+ values in each of the specific studies included in the review were derived from different thresholds. Nahum and Stanislow5, in a study of 82 nondiabetic women who underwent sonographic evaluation within 3 weeks of delivery, also concluded that AC as a single measure was as accurate as sonographic fetal weight-estimation models that are based on multiple biometric indices. By contrast, Combs et al.6 compared 31 sonographic fetal weight-estimation models for the prediction of fetal macrosomia within 2 weeks of delivery in a group of 165 diabetic women. After the models were rank-ordered by systematic error, absolute error and AUC, the authors found that the model of Ott et al.28 yielded the best results, and that 28 of the 31 models evaluated predicted macrosomia better than did AC as a single measure. Similarly, Hart et al.27 compared the accuracy of a new sonographic model intended for macrosomic fetuses with that of seven commonly used models. While the new model provided the best results, the model of Campbell et al.11, which is based on AC as a single measure, was the least accurate.
These apparently conflicting results may be explained by the methodological limitations of the studies described above, such as a small sample size5, 6, large differences in the interval between sonographic evaluation and delivery5–7, comparison of a small number of models and different study populations (e.g. diabetic6 vs. nondiabetic5 women). To overcome these limitations, we compared a large number of sonographic fetal weight-estimation models using a large, unselected cohort of women who underwent sonographic examination within 3 days prior to delivery.
Another major problem is that almost all studies conducted to date used a fixed threshold (i.e. estimated fetal weight > 4000 g or 4500 g) to compare the accuracy of different models for the detection of macrosomia4–6, 8–10, a threshold which, as described earlier, does not necessarily represent the optimal threshold for this purpose. Indeed, O'Reilly-Green and Divon7 found that the optimal threshold for the detection of macrosomia using the model of Hadlock et al.14 was 3711 g rather than 4000 g. As a result, comparison of models using such a fixed threshold (rather than a model-specific optimal threshold) may not reflect the true relative accuracy of the different models. To control for this factor, in the present study we compared the models using two alternative, model-specific, optimal thresholds. Indeed, we found that the variation in sensitivity and specificity among the models was much lower when the threshold associated with the inflexion point of the ROC curve was used (compared with the fixed threshold of 4000 g), and that the use of this threshold enabled a more balanced comparison of the performance of the different models.
Another factor that complicates the interpretation of available studies is the difficulty in statistically assessing the differences in performance of each of the models. Some studies merely presented the results of the different models without direct comparison of their performance5. Others used semiquantitative methods, such as ranking the models according to one or more measures of accuracy, without determining whether the differences in the performance of models at different ranks are statistically significant6. In the current study we used a novel approach of cluster analysis to divide the models into homogeneous groups based on their sensitivity and specificity, and the differences in sensitivity and specificity of the models representing each of the clusters were tested for statistical significance. Using this strategy, we found that models based on three or four biometric indices were significantly over-represented in the most accurate cluster.
In conclusion, models based on three or four biometric indices appear to be more accurate for the diagnosis of fetal macrosomia than models based on only two indices or on AC as a single measure. Furthermore, it appears that the type of threshold used (i.e. a model-specific threshold that is optimized for the diagnosis of fetal macrosomia vs. a fixed threshold) is at least as important as the type of model used. Further prospective studies are needed to establish the observations made in the current study.