The contribution of R. Romero to this article was prepared as part of his official duties as a United States Federal Government employee.
This article was published online on 1 September 2009. An error was subsequently identified. This notice is included in the online and print versions to indicate that both have been corrected (22 September 2009).
The main goal of this study was to determine the accuracy and precision of new fetal weight estimation models, based on fractional limb volume and conventional two-dimensional (2D) sonographic measurements during the second and third trimesters of pregnancy.
A prospective cross-sectional study of 271 fetuses was performed using three-dimensional ultrasonography to extract standard measurements—biparietal diameter (BPD), abdominal circumference (AC) and femoral diaphysis length (FDL)—plus fractional arm volume (AVol) and fractional thigh volume (TVol) within 4 days of delivery. Weighted multiple linear regression analysis was used to develop ‘modified Hadlock’ models and new models using transformed predictors that included soft tissue parameters for estimating birth weight. Estimated and observed birth weights were compared using mean percent difference (systematic weight estimation error) and the SD of the percent differences (random weight estimation error). The proportion of newborns with estimated birth weight within 5 or 10% of actual birth weight were compared using McNemar's test.
Birth weights in the study group ranged from 235 to 5790 g, with equal proportions of male and female infants. Six new fetal weight estimation models were compared with the results for modified Hadlock models with sample-specific coefficients. All the new models were very accurate, with mean percent differences that were not significantly different from zero. Model 3 (which used the natural logarithms of BPD, AC and AVol) and Model 6 (which used the natural logarithms of BPD, AC and TVol) provided the most precise weight estimations (random error = 6.6% of actual birth weight) as compared with 8.5% for the best original Hadlock model and 7.6% for a modified Hadlock model using sample-specific coefficients. Model 5 (which used the natural logarithms of AC and TVol) classified an additional 9.1% and 8.3% of the fetuses within 5% and 10% of actual birth weight and Model 6 classified an additional 7.3% and 4.1% of infants within 5% and 10% of actual birth weight.
The nutritional status of newborn infants is routinely assessed using birth weight and population-based standards. In 2004 the World Health Organization estimated that more than 20 million infants were born with a birth weight of less than 2500 g, the majority in Asia and Africa1. In the US alone, the National Center for Health Statistics reported that 8.2% of over 4.1 million newborns were delivered with low birth weight2. At the other end of the spectrum, 9.1% of American newborns were delivered with birth weights of at least 4000 g. Both of these extreme conditions represent public health issues of major importance. Low birth weight is an important determinant of infant mortality and is associated with an increased risk of hypertension, diabetes and obesity in adult life3. Macrosomic infants have an increased likelihood of operative delivery, shoulder dystocia, brachial plexus injury, anal lacerations and postpartum infection4. Thus the routine practice of estimating fetal weight is supported by a clinical need to detect and monitor abnormal growth.
Estimated fetal weight (EFW) has been used to identify growth abnormalities for more than three decades. A sonographic measurement of abdominal circumference (AC) is usually combined with other growth predictors, such as head circumference (HC) and/or femoral diaphysis length (FDL), for the prediction of EFW prior to delivery. In a systematic review of eleven prediction models, Dudley5 concluded that there was no preferred method for the ultrasonographic estimation of fetal weight because the magnitude of random errors resulting from these predictions was a major obstacle to their confident use in clinical practice. This review concluded that 95% confidence intervals exceeded 14% of birth weight in all studies. Clearly, these important observations indicate that we must improve the precision of EFW calculations.
Despite the widespread application of fetal-weight-estimation models in obstetrical practice, relatively few groups have included soft tissue assessment for this purpose. Several investigators have proposed soft tissue evaluation of the fetal thigh thickness or circumference6, 7, cheek-to-cheek diameter8, 9, abdominal subcutaneous tissue thickness10–14, and appearance of the fetal buttocks15. Practical implementation of these soft tissue predictors is limited by a paucity of validation data regarding the reproducibility of these measurements between examiners throughout pregnancy.
Three-dimensional ultrasonography (3D-US) provides a method for limb volume measurements and subsequent calculation of EFW. Chang et al.16 and Liang et al.17 initially described the use of arm or thigh volumes for estimating weight during the third trimester. Their measuring procedure took approximately 10–15 min to complete for each limb. Schild et al.18 have described a combination of two-dimensional (2D) and 3D sonographic parameters for calculating EFW. Volume predictors included fetal thigh, upper arm and abdomen. Superior fetal weight estimation was achieved by including these soft-tissue predictors, and they concluded that the extra time spent on measuring volumes was justified in cases where accurate weight estimation was important. Improved results were also obtained using multiple parameters in a subsequent study of fetuses weighing 1600 g or less19. Unfortunately, these volume measurements alone took an average of 10 to 15 min for each fetus.
The clinical application of volume measurements for weight estimation is limited by the extra time that is required to manually trace soft tissue borders along the entire limb. Acoustic shadowing also makes it difficult to confidently trace soft tissue borders near the limb joints. The concept of fractional limb volume was introduced in order to address these technical limitations20. This soft-tissue parameter is derived from a central portion of the limb diaphysis because transverse slices of the mid-limb are more likely to display the sharpest soft-tissue borders. Measuring times are substantially reduced because only five equidistant slices are traced within the partial limb volume and areas of acoustic shadowing are less likely to occur. Fractional limb volume measurements are also reproducible between blinded examiners, and technical factors affecting their acquisition have already been described21.
We now examine the accuracy and precision of new fetal weight estimation models that combine fractional limb volume with conventional 2D sonographic measurements—biparietal diameter (BPD), AC and FDL—during the second and third trimesters of pregnancy.
This was a prospective, cross-sectional study of pregnant women conducted at William Beaumont Hospital, Wayne State University and the Eunice Kennedy Shriver National Institute of Child Health and Human Development. The women were invited to participate under informed consent, and the study was approved by the appropriate institutional review boards. The inclusion criterion was that the women be in their second or third trimester of pregnancy. The protocol excluded multiple gestations, fetuses with structural anomalies and fetuses with poorly visualized limbs owing to technical factors. Subjects were primarily from uncomplicated pregnancies, although some had known risk factors that included suspected fetal growth restriction (n = 42), hypertension (n = 17) and diabetes (n = 13). Maternal age, gravidity, menstrual age at time of scan, fetal gender, ethnicity and presence of obstetrical complications were documented. Some of the research subjects took part in a previous study that was reported in our publication that described the relationship of birth weight to fractional limb volume in late third-trimester fetuses (n = 87)20. Eighty-eight additional subjects were reported in two other articles that were unrelated to the derivation of new fetal weight estimation models21, 22.
Fetal age was based on the first day of the last normal menstrual period and confirmed by either first- or early second-trimester ultrasound scan. A normal last menstrual period was defined as regular cyclic menses without antecedent oral contraceptive use. Age estimates in the first trimester were based on crown–rump length measurements23. Age estimates in the second trimester were determined from measurements of BPD, HC, AC and FDL24–27. Sonographic age was used to adjust menstrual age if there was a discrepancy of more than 1 week between menstrual dating and sonographic assessment.
Three-dimensional (3D) volume data sets were acquired from the head, trunk and thigh as previously described21. Each pregnancy was scanned only once within 4 days of delivery. The data were acquired using hybrid mechanical and curved array abdominal ultrasonic transducers (RAB 4-8P, RAB 2-5P; Voluson 730 and Voluson Expert, GE Healthcare, Milwaukee, WI, USA). Image depth and magnification were adjusted for a volume of interest to fill at least two-thirds of the video display screen. For fractional limb volume, the acoustic focal zone was adjusted near the long-bone diaphysis and the system gain was optimized. Each volume acquisition, lasting approximately 10 s, was taken from a sagittal sweep of the limb diaphysis. Image data were archived on digital media for subsequent off-line analysis.
Sonographic measurements of the AC and FDL were extracted from the acquired volume data sets; routine head volume acquisitions were added in 2001 to allow inclusion of BPD. Two commonly used fetal weight estimation models from Hadlock et al.28, 29 (one using AC and FDL, the other BPD, AC and FDL) were used as a basis of comparison. Fractional limb volumes were calculated using commercially available software (4D View, GE Healthcare). Volume measurements were based on either 50% of humeral diaphysis (AVol) or 50% of femoral diaphysis length (TVol). Each partial volume was subdivided into five equidistant slices centered along the mid-arm or mid-thigh21, 22. Images were again magnified to fill at least two-thirds of the display. Soft tissue borders were enhanced by the use of a color filter (sepia) with additional gamma curve adjustments for brightness and contrast. The fractional limb volume was automatically calculated after each of the five slices was manually traced from a transverse view of the extremity.
All continuous variables were first assessed using numerical and graphical techniques, including scatter plots, to determine if they met the distributional assumptions of the statistical tests being used to analyze them. All scatter plots (anatomic parameters vs. birth weight) revealed curvilinear relationships and the presence of heteroscedasticity. Natural logarithmic transformations from the Box–Cox family of transformations were applied to all growth parameters and birth weight.
Weighted regression analysis
Weighted regression analysis was performed with transformed data on all the models to address heteroscedasticity. These weights were computed as the reciprocal of model-specific variance because they represent the best linear unbiased estimates of true birth weights. Professor Altman has previously reviewed key statistical concepts for this procedure30–32. The weights are multiplied by √(π/2), or 1.253, using a half standard normal distribution30. Since all the anatomic parameters change as a function of pregnancy age, the residuals should have a normal distribution at each value of the parameter, and the absolute values of the residuals should have a half normal distribution. It follows that the mean of the absolute residuals multiplied by √(π/2) is an estimate of the SD of the residuals. If the SD is not fairly constant for each parameter, their predicted values from regressing absolute residuals against the predictors multiplied by √(π/2) will provide parameter-specific estimates of the SD of the signed residuals, and hence of birth weight. For each model, studentized residuals from the weighted regression analysis were assessed for normality using a normal probability plot.
Main effects polynomial regression models were fitted to the data to capture the curvature in the data. A quadratic or cubic term for each predictor in the model typically provided a good fit to the data as evidenced by random patterns in the residual analyses and excellent coefficients of multiple determination. Hence, interaction terms were not required for any of the new weight estimation models.
For each model, variance inflation factors (VIFs) were examined to assess the magnitude and severity of multicollinearity33. Severe multicollinearity, typically indicated by a VIF > 10, does not usually influence the ability of a fitted model for making inferences about mean responses or for making predictions, provided that the predicted values for which inferences are to be made follow the same multicollinearity pattern as the data on which the regression model is based33. Since our main goals were to develop robust weight estimation models and to make inferences using predicted values that follow the same pattern of multicollinearity as fetal weight, there was no need to incorporate corrective measures such as Ridge Regression to address this matter33–36.
Systematic and random weight estimation errors
For each regression model the mean square error (MSE) was computed as an unbiased estimator of the population variance – a required mathematical property of estimates that is statistically referred to as random error. The positive square root of MSE was used to estimate the population standard deviation. Systematic errors were also calculated as the mean percent differences in birth weight using the following formula: Percent difference = [(estimated birth weight − actual birth weight)/(actual birth weight)] × 100. Random errors were expressed by the SD of these percentage differences. Systematic error between the proposed best models and Hadlock's original (OH1 and OH2) and modified models (MH1 and MH2) were compared using either the Sign or Student's t-tests. Random errors were compared using the Pitman test for correlated variances37, 38.
Effect of adding fractional limb volume to conventional weight estimation models
New fetal weight estimation models with the best performances were compared to comparable models using the partial F tests that were based on the sequential and extra sums of squares. This technique is routinely used for the comparisons of fitted regression models33.
The performance of each model was also compared by examining the proportion of estimated weights that were correctly classified to within ± 5% and ± 10%. This was accomplished by analyzing 2 × 2 contingency tables using the McNemar's test for paired observations using the Hochberg adjustment for multiple comparisons. This test depends on the number of disagreements between two models, namely the number of false positives and false negatives. When the number of false negatives is much lower than the number of false positives, it is strongly indicative of a drop in agreement with corresponding significant P-values.
Sample size calculations and power analysis
Calculations were performed using PASS 200538, 39. P less than an alpha of 0.05 (probability of type I error) was considered statistically significant. Statistical analysis was performed using the SAS System for Windows (version 9.2, SAS Institute, Cary, NC, USA).
Two hundred and seventy-one pregnancies were prospectively scanned within 4 days of delivery during the study period from June 1998 to September 2008, the majority of which—207 (76.4%)—were scanned between 1998 and 2002. Examinations were performed at between 18.4 and 42.1 weeks' gestation (based on menstrual dates). The mean maternal age was 30.6 ± 5.9 years with an average gravidity of 2.9 pregnancies. Ethnicities included 63.5% White, 27.3% Black, 4.4% Asian, 1.5% Hispanic, and 1.5% Native American Indian subjects. Newborn infants comprised 50.2% females and 49.8% males. Birth weights were non-normally distributed and ranged from 235 to 5790 g (Shapiro–Wilk's test, 0.965, P < 0.0001) (Figure 1).
Curvilinear relationships of sonographic parameters to birth weight are shown in Figure 2. Transformed parameters and their relationships to the natural logarithm of birth weight demonstrated increased linearity and reduced heteroscedasticity (Figure 3).
The original Hadlock weight estimation functions (OH1 and OH2) were based on a Houston population sample23, 24 (Table 1). When these published models were applied to our Michigan sample, their mean systematic weight estimation errors indicated an overestimation that ranged between 7.7 and 8.8% of actual birth weight. The best precision (random weight estimation error = 8.5% of birth weight) was obtained using a three-parameter model that included BPD2, AC and FDL. Modified Hadlock weight estimation models (MH1 and MH2), using the same anatomic parameters as OH1 and OH2, were also developed for our study sample. The modified Hadlock models (MH1 and MH2) were very accurate, with mean systematic errors that were not statistically different from zero (P = 0.5069 and 0.654, respectively; Student's t-test) and random errors that were similar to the performance of the original Hadlock models, ranging from 7.6 to 8.3%.
Table 1. Original and modified Hadlock fetal weight estimation models with their systematic and random errors for our population sample
Original and modified weight estimation function
Mean percent difference
SD percent difference
The original Hadlock models are expressed with their published r2 values28 and the modified Hadlock models are expressed with their adjusted r2 values. Mean percent difference = (estimated weight − birth weight)/birth weight × 100. AC, abdominal circumference; BPD, biparietal diameter; BW, birth weight; FDL, femoral diaphysis length; OH1, original Hadlock model using AC and FDL; OH2, original Hadlock model using AC, BPD and FDL; MH1, modified Hadlock model using AC and FDL; MH2, modified Hadlock model using AC, BPD and FDL; MSE, mean squared error.
Table 2 summarizes six optimized multiple regression models with their coefficients and y-intercepts. Soft-tissue parameters were combined with conventional 2D measurements for estimating the natural logarithm of birth weight as the outcome variable. Models 2 and 5 made use of two parameters that included the fetal trunk and limb, while Models 3 and 6 made use of three parameters that included the fetal head, trunk and limb. The systematic and random errors of these new models, using fractional limb volume, are also summarized. All of them had high adjusted r2 values. Mean squared errors were lowest for Models 2, 3, 5 and 6. Models 3 and 6 had the lowest mean percent differences or systematic errors, ranging from 0.12 to 0.18%, which were not significantly different from zero (P < 0.0001, Student's t-test). Models 3 and 6 demonstrated the most precise weight predictions (random error = 6.6% of birth weight). The standard deviations of the percent differences for the two-parameter models (Models 2 and 5) or three-parameter models (Models 3 and 6) were significantly lower when compared to their corresponding modified Hadlock models (P < 0.05).
Table 2. New multiple regression models for fetal weight estimation with their systematic and random errors for a local population sample
Figure 4 compares the proportion of newborn infants with EFW results that were correctly classified as being within 5 or 10% of birth weight.
Comparison of new models with original Hadlock models
First, two-parameter models of the trunk and limb were compared. The original Hadlock model OH1 correctly classified 30.5% and 53.1% of newborns within 5% and 10% of birth weight, respectively. The corresponding new two-parameter models correctly classified a significantly greater proportion of infants within 5% (Model 2, 50.8%; Model 5, 56.4%) or 10% (Model 2, 84.8%; Model 5, 84.9%) (P < 0.0001). Next, the three-parameter models of the head, trunk, and limb were also compared. The original Hadlock model OH2 correctly classified 35.7% and 63.6% of newborns within 5% and 10% of birth weight, respectively. The corresponding new two-parameter models classified a significantly greater proportion of infants within 5% (Model 3, 50.4%; Model 6, 57.3%) or 10% of birth weight (Model 3, 89.8%; Model 6, 84.1%) (P < 0.0001).
Comparison of new models with modified Hadlock models
The two-parameter modified Hadlock model (MH1) classified 47.3% and 76.6% of newborns within 5% and 10% of birth weight, respectively. By comparison, the three-parameter modified Hadlock model (MH2) classified 50.0% and 80.0% of newborns within 5% and 10% of birth weight, respectively. Based on EFW comparisons within 5% shown in Figure 4, no significant differences were found between Models 2 and 3 when compared to their corresponding modified Hadlock models. Model 5 classified an additional (compared with MH1) 9.1% and 8.3% of the fetuses within 5% and 10% of birth weight. Similarly, Model 6 classified an additional (compared with MH2) 7.3% and 4.1% of infants within 5% and 10% of birth weight.
Figure 5 shows the relationship between systematic and random errors for all fetal weight estimation models. Models 6 and 3 demonstrated the best overall accuracy and precision when compared to the modified Hadlock models (MH1 and MH2).
In order to determine the effect of adding a soft-tissue parameter to AC alone, we compared the following weight estimation models that differed by only one term—the presence or absence of fractional limb volume.
Effect of adding ln AVol to ln AC
The addition of ln AVol to a model already containing ln AC and (ln AC)2 improved EFW by explaining an additional 2.0% of the total variance in ln BW (P < 0.0001).
Effect of adding ln AC to ln TVol
A model that was based on ln TVol alone explained 96.1% of the total variation in ln BW. The addition of ln AC accounted for an additional 1.9% of the total variance in ln BW (P < 0.0001).
Sample size calculations were based on the two optimal models that included ln AVol or ln TVol parameters. A sample of 138 patients achieved nearly 100% power to detect their respective r2 values of 0.9897 and 0.9873 attributed to four independent variables. This is based on an F-test with a significance level (alpha) of 0.05.
Intrauterine malnutrition—as a result of protein and/or micronutrient deficiencies—is a commonly suspected cause of poor fetal growth40. Although this condition cannot be precisely established during fetal life, it is biologically plausible that a malnourished fetus would manifest insufficient or excessive soft tissue development. Indeed, there is mounting epidemiological and clinical evidence for an association between fetal programming of body composition and musculoskeletal development41. For example, birth weight and poor prenatal nutrition are associated with altered fat distribution42, 43, reduced muscle mass44 and low bone mineral density45, 46—all of which have the potential for affecting cell numbers, altering stem-cell function, and resetting of regulatory hormones during later adult life. Ay et al.47 recently described an association between fetal weight changes during the late pregnancy with postnatal ‘catch-up’ growth within 6 weeks after birth. Their investigation used dual energy x-ray absorptiometry scans in the same infants at 6 months to demonstrate that these fetal and postnatal growth patterns were significantly correlated with body composition into early childhood. A related longitudinal investigation of 1012 children from the ‘Generation R’ project also found that subcutaneous fat mass tends to track in the first 2 years after birth48. The aforementioned studies underscore the importance of fetal nutritional assessment and its potential impact on the continuum of health and disease during adult life.
Fractional limb volume measurement has been proposed for the detection and monitoring of malnourished fetuses21. This concept is supported by an anthropometric study of neonatal body composition that estimated lean body and fat mass in 188 newborn infants within 24 hours of birth. Although neonatal fat mass constituted only 14% of total birth weight, it explained 46% of its variance49. However, widely accepted weight-estimation models do not usually consider the clinical significance of fetal soft tissue in routine obstetrical practice. This practical limitation is partially explained by technical challenges related to the reproducibility of fetal soft-tissue measurements.
At least three investigations have suggested that conventional 2D sonographic measurements do not accurately predict adiposity of newborns. For example, we have previously described the relationship between 2D sonographic parameters, fractional limb volume, EFW, and birth weight to neonatal percent body fat using air displacement plethysmography. Fractional thigh volume had the greatest correlation to percent body fat in third-trimester newborn infants22. Similar to actual birth weight, the TVol predictor explained 46.1% of the variability in percent body fat. Abdominal circumference and EFW accounted for only 24.8% and 30.4% of the variance in percent body fat, respectively. Another study also used air displacement plethysmography to demonstrate that 2D fetal sonographic measurements do not provide a reliable assessment of percent body fat in term infants50. Khoury et al.51 also correlated 2D sonographic parameters, fractional thigh volume, and birth weight with neonatal skin fold measurements. They concluded that fractional thigh volume reflects neonatal fat mass and is better correlated with birth weight than are conventional 2D measurements.
The widely accepted Hadlock weight estimation models (based on data collected in Houston, TX) were used as an initial basis for comparison28, 29. In the present study, one of the original Hadlock models (using BPD, AC and FDL) was associated with an 8% systematic error for our study sample. This overestimation may be related to multicollinearity from the interaction between two or more highly correlated predictor variables. Multicollinearity can cause relatively large standard errors of model coefficients and increased variability in weight estimates when these formulae are applied to different populations. To minimize this effect ‘modified Hadlock’ functions and model coefficients were developed using the same sample from which the new prediction formulae were derived. A more objective comparison of model performance was achieved by substituting a 2D limb parameter, such as ln FDL, with a corresponding limb volume parameter such as ln TVol. Relatively few sonographic studies have correlated anatomic parameters with birth weight using sample-specific model coefficients19, 20.
Lindell and Marsal52 recently reported fetal weight estimation using fractional thigh volume for a Swedish population. They studied 176 pregnant women at ≥ 287 days of gestation within 4 days of delivery. Results obtained using the formula of Persson and Weldner (using BPD, abdominal diameter and FDL)53 were compared to those obtained using preliminary weight estimation models (using BPD, AC and TVol) that were previously reported by our research group20, 54. A new formula employing HC, abdominal diameter, abdominal volume and TVol was developed for their population sample. Both the Persson and Weldner model53 and our preliminary fetal weight estimation model using TVol55 yielded the smallest random weight estimation errors of 6.3%, although the latter led to underestimated mean percent differences of 6.0%. For 63 subjects, their new volume-based model (using HC, abdominal diameter, abdominal volume and TVol) resulted in a mean percent difference of 0.3 ± 5.6%.
A more appropriate and objective comparison of fetal-weight-estimation models can be made if the model coefficients (for both new models and those from the literature) are derived from the same local study population. Our data underscore the importance of using sample-specific model coefficients in comparing the performance of new weight estimation models with published methods (Table 1). Studies that do not compare models with sample-specific coefficients are subject to weight estimation errors due to differences in sample characteristics, which appear to affect systematic errors more than random errors. In this context, Siemer et al.55 retrospectively compared 3975 pregnancies with commonly used weight estimation models derived from regression analysis56–59. The Hadlock model (based on BPD, AC and FDL) had the lowest systematic weight estimation error, of −0.28%28, 29. Seven other models had systematic errors ranging from −8.84% to +5.28%. The best random weight estimation error, of 9.49% (SD of percent differences), resulted from using Dudley's model (EFW = (0.32× AC2× FDL) + (0.053× HC2× FDL))59. By comparison, the Hadlock model had a random weight estimation error of 10.0% in the Siemer study55. Many of these discrepancies may have resulted from comparing models that were derived from different populations. In our investigation, the original Hadlock model (OH2) had a mean error of 7.7 ± 8.5%; the systematic weight estimation error was improved by applying a modified Hadlock MH2 model (0.29 ± 7.6%), using specific coefficients from our patient sample.
Our results indicate that fractional limb volume can be combined with 2D sonographic measurements of the head and trunk to improve the precision of EFW. This approach may provide a novel assessment of fetal soft-tissue development as part of the weight-estimation procedure. Several statistical modeling techniques were used to develop optimal fetal-weight-estimation models that included soft-tissue parameters. The substitution of fractional limb volume for long bone length, use of natural logarithmic transformations with weighted regression analysis, and selective application of squared transformed parameter terms essentially reduced the random error to 6.6%. A validation study from an independent sample is currently under way to examine the performance of these new fetal-weight-estimation models in all weight groups, including macrosomic fetuses, before they can be confidently adopted for routine obstetrical care.
The authors wish to acknowledge the technical assistance of Melissa Powell, RDMS and Beverley McNie, BS, CCRP. This research was supported (in part) by the Perinatology Research Branch, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, DHHS.