SEARCH

SEARCH BY CITATION

Keywords:

  • accuracy;
  • comparison;
  • macrosomia;
  • models;
  • prediction

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. REFERENCES

Objective

To compare the accuracy of 21 sonographic fetal weight-estimation models and abdominal circumference (AC) as a single measure for the prediction of fetal macrosomia (> 4000 g) using either fixed or optimal model-specific thresholds.

Methods

A total of 4765 sonographic weight estimations performed within 3 days prior to delivery were analyzed. The predictive accuracy of 21 published sonographic fetal weight-estimation models was calculated using three different thresholds: a fixed threshold of 4000 g; a model-specific threshold obtained from the inflexion point of the receiver–operating characteristics (ROC) curve; and a model-specific threshold associated with the highest overall accuracy. Cluster analysis was used to determine whether a certain combination of fetal biometric indices is associated with a higher predictive accuracy than others.

Results

For a fixed threshold of > 4000 g, there was considerable variation among the models in sensitivity (range, 13.6–98.5%) and specificity (range, 63.6–99.8%) for fetal macrosomia. Use of the threshold derived from the inflexion point of the ROC curve decreased the intermodel variation to a minimum (sensitivity, 84.4–91.4%; and specificity, 79.5–86.3%). Even when this optimal model-specific threshold was applied, models based on three to four biometric indices were more accurate than were models based on only two biometric indices or on AC as a single measure (P = 0.03).

Conclusions

Sonographic fetal weight-estimation models based on three to four biometric indices appear to be more accurate than are models based on two indices or on AC as a single measure, for the diagnosis of macrosomia. In these cases, the use of an optimal, model-specific threshold is associated with a higher degree of accuracy than is the uniform use of a fixed threshold of an estimated weight of > 4000 g. Copyright © 2011 ISUOG. Published by John Wiley & Sons, Ltd.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. REFERENCES

Fetal macrosomia is associated with adverse maternal and fetal outcomes1, and failure to correctly identify fetal macrosomia may further increase the risk for adverse perinatal outcome2, 3. However, the optimal sonographic fetal weight-estimation model for the prediction of fetal macrosomia remains controversial. Specifically, while some found that abdominal circumference (AC) as a single measure is highly accurate in the prediction of fetal macrosomia4, 5, others reported that models based on multiple fetal biometric indices provided better results6. This discrepancy may be related, at least in part, to methodological limitations of some of the studies, such as small sample size5, 6, comparison of only a small number of different models and the inclusion of cases in which the sonographic examination was performed up to 1 week prior to delivery5–7, a period during which significant fetal weight gain may occur.

Another important limitation is that a fixed threshold (e.g. estimated weight > 4000 g) has been used to compare the accuracy of different sonographic fetal weight-estimation models for the detection of macrosomia by almost all studies conducted to date8–10. This type of threshold, however, may not be the optimal threshold for the prediction of macrosomia given that most published models were developed by regression analyses that provided the best overall fit with actual birth weight throughout the range of birth weights11–16, even though it may well be that the fit between estimated fetal weight and actual birth weight in a given subrange of fetal weights (e.g. > 4000 g) is suboptimal17. Therefore, it is possible that the optimal threshold for the identification of macrosomia may not be an estimated weight of > 4000 g, and that the use of another, optimized model-specific threshold, may be more accurate for this purpose7. In addition, comparing models using these optimized, model-specific thresholds may provide more reliable information regarding the relative accuracy of the different models for the detection of fetal macrosomia.

The aim of the present study was to compare the accuracy of 21 sonographic fetal weight-estimation models and AC as a single measure for the prediction of fetal macrosomia using either fixed or optimal model-specific thresholds in a large, unselected cohort of women who underwent sonographic examination within 3 days prior to delivery.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. REFERENCES

Data collection

A retrospective cohort study design was used. Data were collected from a comprehensive database of sonographic examinations in a single tertiary center. Routine sonographic evaluations included the standard fetal biometry measurements (AC, femur diaphysis length (FL), biparietal diameter (BPD) and head circumference (HC)), and the findings were saved directly to the database. Antenatal data, gestational age at delivery and actual birth weights were obtained from the hospital's perinatal database. The study was approved by the local Institutional Review Board.

Study population

The database was searched for all sonographic fetal weight estimations performed within 3 days prior to delivery between 2002 and 2008. Inclusion criteria for the study were live-birth singleton pregnancy, birth weight > 500 g, gestational age > 24 weeks and absence of fetal malformations or hydrops. Women with pregnancies complicated by gestational or pregestational diabetes and women in active labor or with ruptured membranes were excluded.

Definitions

Gestational age at the time of examination was recorded in the database along with the details of the sonographic examination and was calculated by the last menstrual period (LMP). When first-trimester ultrasound was available, the LMP was corrected based on the crown–rump length (CRL) when the discrepancy between the calculated LMP (based on Hadlock's CRL reference tables18) and the reported LMP exceeded 7 days, according to the recommendations of the American College of Obstetricians and Gynecologists (ACOG)19. The gestational age at the time of examination was further verified by comparing the interval (in days) between the ultrasound-examination date and the delivery date with the interval between gestational age at the time of examination and gestational age at delivery (the latter was available from the perinatal database). As these intervals are expected to be identical (considering that gestational age in both cases should have been calculated using the same LMP), cases in which the difference between these intervals was greater than 1 day were excluded.

All sonographic fetal-weight estimations at our center are made at the Ultrasound Unit of the Department of Obstetrics and Gynecology. Weight estimations are performed by senior physicians who are ultrasound specialists or by experienced ultrasound technicians. In the latter, the examination is reviewed and signed by a senior physician. The examinations were performed transabdominally using high-quality ultrasound systems (Voluson E8 and Voluson 730 Expert, GE Medical Systems, Zipf, Austria and ATL 5000, Philips Healthcare, Eindhoven, The Netherlands).

The BPD is measured from the proximal echo of the fetal skull to the proximal edge of the deep border (outer–inner) at the level of the cavum septi pellucidi. The HC is measured as an ellipse around the perimeter of the fetal skull20. The AC is measured in the transverse plane of the fetal abdomen at the level of the umbilical vein in the anterior third and the stomach bubble in the same plane; measurements are taken around the perimeter21. The FL is measured in a view in which the full femoral diaphysis is seen and is taken from one end of the diaphysis to the other, not including the distal femoral epiphysis22.

Models and model-groups

For each sonographic examination, estimated fetal weight was calculated using 21 sonographic fetal weight-estimation models published in the literature (Table 1). The models were subdivided into six model-groups according to the combination of the fetal biometric indices incorporated in each model: Model 1, AC and FL; Model 2, AC and BPD; Model 3, AC and HC ( ± BPD); Model 4, AC, FL and BPD; Model 5, AC, FL and HC; and Model 6, AC, FL, BPD and HC (Table 1).

Table 1. Common models used for sonographic fetal weight estimation
ModelReferenceEquation
  • Abdominal circumference (AC), biparietal diameter (BPD), femur diaphysis length (FL) and head circumference (HC) expressed in cm and estimated fetal weight (EFW) expressed in g, unless stated otherwise.

  • *

    FL expressed in mm.

  • EFW expressed in kg.

Group 1 (AC and FL)  
 1Hadlock et al. (1985)29Log10 EFW = 1.304 + 0.05281(AC) + 0.1938(FL) − 0.004(AC)(FL)
 2Woo et al. (1985)16Log10 EFW = 0.59 + 0.08(AC) + 0.28(FL) − 0.00716(AC)(FL)
 3Warsof et al. (1986)30*Ln EFW = 2.792 + 0.108(FL) + 0.0036(AC)2 − 0.0027(FL)(AC)
Group 2 (AC and BPD)  
 4Vintzileos et al. (1987)31Log10 EFW = 1.879 + 0.084(BPD) + 0.026(AC)
 5Warsof et al. (1977)12Log10 EFW = − 1.599 + 0.144(BPD) + 0.032(AC) − 0.000111(BPD)2 (AC)
 6Shepard et al. (1982)13Log10 EFW = − 1.7492 + 0.166(BPD) + 0.046(AC) − 0.002546(AC)(BPD)
 7Jordaan (1983)32Log10 EFW = − 1.1683 + 0.0377(AC) + 0.0950(BPD) − 0.0015(BPD)(AC)
 8Hadlock et al. (1984)14Log10 EFW = 1.1134 + 0.05845(AC) − 0.000604(AC)2− 0.007365(BPD)2 + 0.000595(BPD)(AC) + 0.1694(BPD)
 9Woo et al. (1985)16Log10 EFW = 1.63 + 0.16(BPD) + 0.00111(AC)2− 0.0000859(BPD)(AC)2
 10Hsieh et al. (1987)33Log10 EFW = 2.1315 + 0.0056541(AC)(BPD) − 0.00015515(BPD)(AC)2 + 0.000019782(AC)3 + 0.052594(BPD)
Group 3 (AC and HC ( ± BPD))  
 11Hadlock et al. (1984)14Log10 EFW = 1.182 + 0.0273(HC) + 0.07057(AC) − 0.00063 (AC)2− 0.0002184(HC)(AC)
 12Jordaan (1983)32Log10 EFW = 0.9119 + 0.0488(HC) + 0.0824(AC) − 0.001599(HC)(AC)
 13Jordaan (1983)32Log10 EFW = 2.3231 + 0.02904(AC) + 0.0079(HC) − 0.0058(BPD)
Group 4 (AC, FL and BPD)  
 14Hadlock et al. (1985)29Log10 EFW = 1.335 − 0.0034(AC)(FL) + 0.0316(BPD) + 0.0457(AC) + 0.1623(FL)
 15Woo et al. (1985)16Log10 EFW = 1.54 + 0.15(BPD) + 0.00111(AC)2− 0.0000764(BPD)(AC)2 + 0.05(FL) − 0.000992(FL)(AC)
 16Shinozuka et al. (1987)34EFW = 0.23966(AC)2(FL) + 1.6230(BPD)3
 17Hsieh et al. (1987)33Log10 EFW = 2.7193 + 0.0094962(AC)(BPD) − 0.1432(FL) − 0.00076742(AC)(BPD)2 + 0.001745(FL)(BPD)2
Group 5 (AC, FL and HC)  
 18Hadlock et al. (1985)29Log10 EFW = 1.326 − 0.00326(AC)(FL) + 0.0107(HC) + 0.0438(AC) + 0.158(FL)
 19Combs et al. (1993)35EFW = 0.23718(AC)2(FL) + 0.03312(HC)3
 20Ott et al. (1986)28Log10 EFW = − 2.0661 + 0.04355(HC) + 0.05394(AC) − 0.0008582(HC)(AC) + 1.2594(FL/AC)
Group 6 (AC, FL, BPD and HC)  
 21Hadlock et al. (1985)29Log10 EFW = 1.3596 + 0.0064(HC) + 0.0424(AC) + 0.174(FL) + 0.00061(BPD)(AC) − 0.00386(AC)(FL)

Measures of accuracy

The accuracy of each of the 21 sonographic fetal weight-estimation models and AC as a single measure in the prediction of fetal macrosomia (> 4000 g) was evaluated using the following measures of accuracy: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+, defined as sensitivity/(1 − specificity)), negative likelihood ratio (LR−, defined as (1 − sensitivity)/specificity), overall accuracy (defined as (true negative and true positive cases)/all cases), and area under the receiver–operating characteristics (ROC) curve (AUC).

The measures of accuracy described above were determined using three different thresholds: a fixed threshold of 4000 g (or 360 mm for AC); a model-specific threshold obtained from the inflexion point of the ROC curve of each individual model—this point was defined as the point closest to the upper left corner of the ROC graph (representing the point of 100% sensitivity and 0% (1 − specificity)), and was identified using a Microsoft Excel 2007 script that automatically calculated the distance from the upper left corner for each of the points on the ROC curve and identified the point with the minimal distance23; and a model-specific threshold associated with the highest overall accuracy for each individual model (as defined in the previous paragraph), which was identified using a Microsoft Excel 2007 script that calculated the overall accuracy score for each of the points on the ROC curve and identified the point associated with the highest score24.

Comparison of the accuracy of the different model-groups

To assess whether a certain combination of fetal biometric indices is better than other combinations, cluster analysis (using the k-means algorithm)25 was used to divide the models into homogeneous subgroups (A–F) according to sensitivity and specificity. The values of the sensitivity and specificity used for this purpose were those calculated using the thresholds derived from the inflexion point of the ROC curve of each individual model, because this type of threshold was associated with the lowest intermodel variation in sensitivity and specificity as well as with the optimal balance between sensitivity and specificity. The results of the cluster analysis were presented graphically, with the x and y coordinates for each model determined by the sensitivity and specificity, respectively. Thus, the models and clusters that are closest to the upper right corner of the graph (i.e. 100% sensitivity and 100% specificity) were considered to have the highest accuracy. The choice of the number of clusters (six) was made after investigating different solutions with the number of clusters ranging from three to eight. In addition, when analyzing the percentage of explained variance as a function of the number of clusters, the use of more than six models resulted in only a minimal increase in the variance explained by the model.

In order to test the significance of the differences between the different clusters, the models representing each of the clusters (i.e. the model closest to the centroid point of each cluster) were compared with each other using the extended McNemar's test26.

Data analysis was performed using SPSS version 15.0 software (SPSS, Inc., Chicago, IL, USA). P < 0.05 was considered significant. Bonferroni corrections were used as necessary to maintain an overall type I error rate of 0.05 when multiple comparisons were carried out.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. REFERENCES

A total of 4765 fetal weight estimations met the inclusion criteria. The characteristics of the women included in the study are presented in Table 2. Most weight estimations (65%) were performed either on the day of delivery or 1 day before. The rate of infants with actual birth weight > 4000 g was 9.0% (Table 2).

Table 2. Demographic and obstetric characteristics of women included in the study (n = 4765)
CharacteristicValue
  1. Data are given as mean ± SD or n (%).

Maternal age (years)29.8 ± 5.1
Nulliparous2031 (42.6)
Gestational age at delivery (weeks)38.1 ± 3.3
Time from fetal weight estimation to delivery (days)1.2 ± 1.1
Fetal weight estimated: 
 On day of delivery1274 (26.7)
 1 day prior to delivery1845 (38.7)
 2 days prior to delivery973 (20.4)
 3 days prior to delivery673 (14.2)
Male fetal sex2465 (51.7)
Birth weight (g)3602 ± 815
Birth weight > 4000 g431 (9.0)

Accuracy of different models using fixed threshold of estimated weight > 4000 g

We initially calculated the accuracy of the different models to predict macrosomia using a fixed threshold of estimated fetal weight of > 4000 g (Table 3). There was considerable variation among the models in sensitivity (mean = 66 ± 19%; range, 13.6–98.5%), specificity (mean = 91 ± 8%; range, 63.6–99.8%), PPV (mean = 52 ± 14%; range, 23.2–89.3%), NPV (mean = 96 ± 2%; range, 91.4–99.7%), LR+ (mean = 13 ± 14; range, 2.7–76.2) and LR− (mean = 0.4 ± 0.2; range, 0.02–0.87), with only minimal variation in AUC (Table 3).

Table 3. Accuracy of models for predicting birth weight > 4000 g using a fixed threshold of estimated fetal weight of > 4000 g
ModelReferenceModel groupSens. (%)Spec. (%)PPV (%)NPV (%)Overall accuracy (%)LR+LR−AUC
  1. Models are presented in decreasing order of overall accuracy. AC, abdominal circumference; AUC, area under the receiver–operating characteristics curve; LR+, positive likelihood ratio; LR−, negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value; Sens., sensitivity; Spec., specificity.

5Warsof et al. (1977)12249.397.064.194.692.316.320.520.92
19Combs et al. (1993)35534.498.368.993.392.120.570.670.91
16Shinozuka et al. (1987)34456.195.960.295.292.013.800.460.92
20Ott et al. (1986)28553.996.160.195.192.013.970.480.92
15Woo et al. (1985)16459.395.358.295.591.812.740.430.92
11Hadlock et al. (1984)14346.996.558.994.491.713.320.550.90
18Hadlock et al. (1985)29559.795.056.195.691.511.890.420.92
9Woo et al. (1985)16213.699.889.391.491.376.240.870.92
21Hadlock et al. (1985)29664.694.053.696.191.110.720.380.92
14Hadlock et al. (1985)29471.892.952.796.890.910.170.30.92
7Jordaan (1983)32269.493.152.496.590.810.060.330.92
8Hadlock et al. (1984)14270.592.751.396.690.59.630.320.92
10Hsieh et al. (1987)33271.892.450.996.890.49.470.300.92
17Hsieh et al. (1987)33473.292.350.896.990.49.450.290.92
1Hadlock et al. (1985)29171.892.451.396.790.39.430.310.92
13Jordaan (1983)32374.990.846.697.189.28.130.280.91
12Jordaan (1983)32364.491.143.696.088.57.200.390.89
3Warsof et al. (1986)30183.086.440.497.986.06.090.200.91
AC  85.883.837.098.184.05.300.170.92
4Vintzileos et al. (1987)31290.580.733.998.781.74.700.120.92
6Shepard et al. (1982)13291.179.933.198.881.04.530.110.92
2Woo et al. (1985)16198.563.623.299.767.12.710.020.91

Accuracy of models using alternative thresholds

Because it is possible that estimated fetal weight > 4000 g does not necessarily represent the optimal threshold for the detection of macrosomia, we calculated, for each model, two alternative thresholds: the first was derived from the inflexion point of the ROC curve of each individual model and the second was the threshold associated with the highest overall accuracy for each individual model. Table 4 shows the model-specific alternative thresholds for each of the models.

Table 4. Accuracy of models for predicting birth weight > 4000 g using thresholds derived from the inflexion point in the receiver–operating characteristics (ROC) curve or thresholds associated with the highest overall accuracy for each individual model
Thresholds derived from inflexion point in ROC curve for each individual modelThresholds associated with highest overall accuracy for each individual model
ModelReferenceThreshold (g or mm)Overall accuracy (%)ModelReferenceThreshold (g or mm)Overall accuracy (%)
  1. Models are presented in decreasing order of overall accuracy. AC, abdominal circumference.

20Ott et al. (1986)28376286.114Hadlock et al. (1985)29425092.8
8Hadlock et al. (1984)14383385.715Woo et al. (1985)16410092.6
15Woo et al. (1985)16373485.27Jordaan (1983)32420092.5
2Woo et al. (1985)16439885.19Woo et al. (1985)16375092.5
4Vintzileos et al. (1987)31410685.110Hsieh et al. (1987)33425092.5
14Hadlock et al. (1985)29382984.917Hsieh et al. (1987)33431092.5
16Shinozuka et al. (1987)34373584.918Hadlock et al. (1985)29416092.5
17Hsieh et al. (1987)33381084.821Hadlock et al. (1985)29417092.5
5Warsof et al. (1977)12363084.716Shinozuka et al. (1987)34415092.4
18Hadlock et al. (1985)29375384.320Ott et al. (1986)28412092.4
AC 36084.01Hadlock et al. (1985)29421092.3
13Jordaan (1983)32382483.75Warsof et al. (1977)12412092.3
21Hadlock et al. (1985)29375783.68Hadlock et al. (1984)14430092.3
10Hsieh et al. (1987)33375383.319Combs et al. (1993)35397092.3
3Warsof et al. (1986)30393283.23Warsof et al. (1986)30436092.2
1Hadlock et al. (1985)29380783.013Jordaan (1983)32442092.2
7Jordaan (1983)32375583.06Shepard et al. (1982)13450091.9
11Hadlock et al. (1984)14365782.411Hadlock et al. (1984)14410091.8
6Shepard et al. (1982)13403282.2AC 37891.8
12Jordaan (1983)32379681.84Vintzileos et al. (1987)31449091.6
9Woo et al. (1985)16334181.612Jordaan (1983)32427091.4
19Combs et al. (1993)35355880.72Woo et al. (1985)16450088.0

Use of the threshold derived from the inflexion point of the ROC curve yielded much lower variation in sensitivity (mean = 87 ± 2%; range, 84.4–91.4%) and specificity (mean = 83 ± 2%; range, 79.5–86.3%) among the models (Figure 1). This threshold was lower than 4000 g for most models (mean = 3800 ± 207 g) and was therefore associated with a higher sensitivity and a lower specificity than the fixed threshold of 4000 g (Figure 1).

thumbnail image

Figure 1. Effect of threshold used on sensitivity (a) and specificity (b) of the different models. Data are presented for a fixed threshold of estimated weight > 4000 g (equation image), thresholds derived from the inflexion point in the receiver–operating characteristics curve of each individual model (equation image, see Table 4) and thresholds associated with the highest overall accuracy for each individual model (equation image, see Table 4). AC, abdominal circumference.

Download figure to PowerPoint

The thresholds associated with the highest overall accuracy for each individual model were higher than 4000 g for most models (mean = 4223 ± 180 g) and were therefore associated with the lowest sensitivity and the highest specificity compared with the other two thresholds tested (Figure 1).

Comparison of predictive accuracy of different model-groups

To assess whether a certain combination of fetal biometric indices is better than others, we used cluster analysis to divide the models into homogeneous subgroups, according to sensitivity and specificity, with the inflexion point of the ROC curve serving as the threshold. As shown in Figure 2, the most accurate cluster (defined as the cluster with the minimal distance between the cluster–centroid point and the point of sensitivity = 100% and specificity = 100%) was Cluster D, followed, in decreasing order of accuracy, by Clusters B, A, C, E and F (P < 0.01).

thumbnail image

Figure 2. Division of models and model-groups into subgroups by cluster analysis. Models were divided into six clusters (A to F), according to sensitivity and specificity, using the k-means cluster analysis algorithm. The boundaries of each cluster are marked by ellipses, and the centroid of each cluster is marked by cluster name. Numbers within each cluster represent model number (a) or model-group number (b) for each of the models. The number adjacent to each cluster represents the relative accuracy of each cluster (#1 being most accurate), as determined by the distance between the centroid point of each cluster and the [sensitivity = 100%, specificity = 100%] point (#1 represents the shortest distance).

Download figure to PowerPoint

Models based on three or four biometric indices (Models 14–21 or model-groups 4–6; Table 1) were over-represented in the two most accurate clusters (Clusters D and B) compared with models based on only two indices or on AC as a single measure (75% vs. 29%, P = 0.03). This was most notable for model-group 4 (AC, FL and BPD; Table 1), which was over-represented in the most accurate cluster (Cluster D, 100% vs. 22%, P = 0.01) (Figure 2b). In contrast, model-group 3 (AC, HC or AC, HC, BPD) was over-represented in the least accurate cluster (Cluster F, 67% vs. 0%, P < 0.001) (Figure 2b).

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. REFERENCES

This study sought to compare the accuracy of different sonographic fetal weight-estimation models for the prediction of fetal macrosomia using either fixed or optimal model-specific thresholds. Our main findings were as follows. First, the use of a fixed threshold of estimated weight > 4000 g is associated with considerable variation among the different models in the predictive accuracy for macrosomia. Second, the AUC, which is a threshold-independent measure of accuracy, is relatively similar for all models. Third, the use of the model-specific threshold derived from the inflexion point of the ROC curve of each individual model decreases the intermodel variation to a minimum. Finally, even when this optimal model-specific threshold is applied, models that are based on three or four biometric indices appear to perform better than do models that are based on only two indices or on AC as a single measure.

At present, the optimal sonographic model for the prediction of fetal macrosomia is unclear, and discrepant findings have been reported regarding the accuracy of AC as a single measure compared with models that incorporate multiple biometric indices4–7, 27. In a recent systematic review4, AC as a single measure was found to have a similar area under the summary ROC curve and a higher pooled LR+ compared with sonographic fetal weight-estimation models (0.85 vs. 0.87, P = 0.9; and 6.9 vs. 5.7, respectively). However, because the summary ROC curve calculated for the sonographic fetal weight-estimation models incorporated the results of multiple different sonographic models, the authors were actually comparing the performance of AC with an ‘average’ of the different sonographic models rather than with the best-performing sonographic models. Furthermore, the comparison of the pooled LR+ between AC and the different sonographic models was limited by the fact that the individual LR+ values in each of the specific studies included in the review were derived from different thresholds. Nahum and Stanislow5, in a study of 82 nondiabetic women who underwent sonographic evaluation within 3 weeks of delivery, also concluded that AC as a single measure was as accurate as sonographic fetal weight-estimation models that are based on multiple biometric indices. By contrast, Combs et al.6 compared 31 sonographic fetal weight-estimation models for the prediction of fetal macrosomia within 2 weeks of delivery in a group of 165 diabetic women. After the models were rank-ordered by systematic error, absolute error and AUC, the authors found that the model of Ott et al.28 yielded the best results, and that 28 of the 31 models evaluated predicted macrosomia better than did AC as a single measure. Similarly, Hart et al.27 compared the accuracy of a new sonographic model intended for macrosomic fetuses with that of seven commonly used models. While the new model provided the best results, the model of Campbell et al.11, which is based on AC as a single measure, was the least accurate.

These apparently conflicting results may be explained by the methodological limitations of the studies described above, such as a small sample size5, 6, large differences in the interval between sonographic evaluation and delivery5–7, comparison of a small number of models and different study populations (e.g. diabetic6 vs. nondiabetic5 women). To overcome these limitations, we compared a large number of sonographic fetal weight-estimation models using a large, unselected cohort of women who underwent sonographic examination within 3 days prior to delivery.

Another major problem is that almost all studies conducted to date used a fixed threshold (i.e. estimated fetal weight > 4000 g or 4500 g) to compare the accuracy of different models for the detection of macrosomia4–6, 8–10, a threshold which, as described earlier, does not necessarily represent the optimal threshold for this purpose. Indeed, O'Reilly-Green and Divon7 found that the optimal threshold for the detection of macrosomia using the model of Hadlock et al.14 was 3711 g rather than 4000 g. As a result, comparison of models using such a fixed threshold (rather than a model-specific optimal threshold) may not reflect the true relative accuracy of the different models. To control for this factor, in the present study we compared the models using two alternative, model-specific, optimal thresholds. Indeed, we found that the variation in sensitivity and specificity among the models was much lower when the threshold associated with the inflexion point of the ROC curve was used (compared with the fixed threshold of 4000 g), and that the use of this threshold enabled a more balanced comparison of the performance of the different models.

Another factor that complicates the interpretation of available studies is the difficulty in statistically assessing the differences in performance of each of the models. Some studies merely presented the results of the different models without direct comparison of their performance5. Others used semiquantitative methods, such as ranking the models according to one or more measures of accuracy, without determining whether the differences in the performance of models at different ranks are statistically significant6. In the current study we used a novel approach of cluster analysis to divide the models into homogeneous groups based on their sensitivity and specificity, and the differences in sensitivity and specificity of the models representing each of the clusters were tested for statistical significance. Using this strategy, we found that models based on three or four biometric indices were significantly over-represented in the most accurate cluster.

In conclusion, models based on three or four biometric indices appear to be more accurate for the diagnosis of fetal macrosomia than models based on only two indices or on AC as a single measure. Furthermore, it appears that the type of threshold used (i.e. a model-specific threshold that is optimized for the diagnosis of fetal macrosomia vs. a fixed threshold) is at least as important as the type of model used. Further prospective studies are needed to establish the observations made in the current study.

REFERENCES

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. REFERENCES