To develop and validate new birth-weight prediction models in Chinese pregnant women using fractional thigh volume.
To develop and validate new birth-weight prediction models in Chinese pregnant women using fractional thigh volume.
Healthy late third-trimester fetuses within 5 days of delivery were prospectively examined using two- (2D) and three- (3D) dimensional ultrasonography. Measurements were performed using 2D ultrasound for standard fetal biometry and 3D ultrasound for fractional thigh volume (TVol) and middle thigh circumference. The intraclass correlation coefficient (ICC) was used to analyze the inter- and intraobserver reliability of the 3D ultrasound measurements of 40 fetuses. Five birth-weight prediction models were developed using linear regression analysis, and these were compared with previously published models in a validation group.
Of the 290 fetuses studied, 100 were used in the development of prediction models and 190 in the validation of prediction models. The inter- and intraobserver variability for TVol and middle thigh circumference measurements was small (all ICCs ≥ 0.95). The prediction model using TVol, femur length (FL), abdominal circumference (AC) and biparietal diameter (BPD) provided the most precise birth-weight estimation, with a random error of 4.68% and R2 of 0.825. It correctly predicted 69.5 and 95.3% of birth weights to within 5 and 10% of actual birth weight. By comparison, the Hadlock model with standard fetal biometry (BPD, head circumference, AC and FL) gave a random error of 6.41%. The percentage of birth-weight prediction within 5 and 10% of actual birth weight was 46.3 and 82.6%, respectively.
Consistent with studies on Caucasian populations, a new birth-weight prediction model based on fractional thigh volume, BPD, AC and FL, is reliable during the late third trimester in a Chinese population, and allows better prediction than does the Hadlock model. Copyright © 2011 ISUOG. Published by John Wiley & Sons, Ltd.
Prediction of fetal birth weight is important in obstetric management. Poor outcome can result from delivering macrosomic neonates with shoulder dystocia or growth-restricted neonates with perinatal asphyxia. The most widely used models based on ultrasonographic fetal biometric measurement to estimate fetal size are not highly accurate. Melamed et al.1 demonstrated that the highest percentage of birth-weight prediction within 10% of actual birth weight for 26 formulae was 80.2%. Most of the currently available fetal weight formulae were established over two decades ago and few2–5consider soft-tissue thickness.
Traditionally, fetal weight was assessed by measurements of fetal biometry using two-dimensional (2D) ultrasound. In 1987, Vintzileos et al.6 showed that the addition of thigh circumference to measurements of the head, abdomen and femur length improves the accuracy of fetal weight estimates. However, soft tissue is poorly characterized by 2D imaging because of its irregular shape. Three-dimensional (3D) ultrasonography can provide more accurate and precise volume measurement of small, irregular objects compared with 2D ultrasonography7–9. Recently, 3D sonography has been used to predict birth weight using fractional thigh volume (TVol)10–15, which includes only the middle 50% of the whole thigh, eliminating the need to analyze the proximal or distal ends of the diaphysis, where soft tissue boundaries are often poorly visualized10. Adding TVol to conventional 2D biometry can improve the prediction of birth weight in a Caucasian population10. Whether the same is true in a Chinese population is not known. It is known that ethnicity and population birth weight, with changes over decades, can have a significant influence on fetal biometry16, 17. Femur length is greater in Caucasian fetuses than in Chinese fetuses18. We postulate that adding TVol to conventional 2D biometry may improve birth-weight prediction in a Chinese population as well.
The objective of our study was to establish a birth-weight prediction model in Chinese pregnant women using TVol, and to investigate its reproducibility and accuracy in term pregnancy.
This was a prospective, cross-sectional study. Between September 2009 and June 2010, a total of 290 Hong Kong Chinese women with a singleton pregnancy who were within 5 days of delivery at 37–42 weeks of gestation were included. We split the study population into two groups, 100 cases were used in the development of the models and 190 cases were used in validating the models. Exclusion criteria were multiple pregnancies, women who delivered more than 5 days after their ultrasound examination, and infants with major structural or chromosomal anomalies. Gestational age was calculated from the first day of the last normal menstrual period. This information was confirmed by either a first-trimester or early second-trimester dating scan in all cases. A normal last menstrual period was defined as regular cyclic menses without antecedent oral contraceptive use. Gestational age confirmation in the first trimester was based on crown–rump length measurements19. Gestational age confirmation in the second trimester was determined by measurements of biparietal diameter (BPD), head circumference (HC), abdominal circumference (AC) and femur length (FL)20–23. Maternal age, gravidity, gestational age at the time of the scan, body weight/BMI and the presence of obstetric complications were documented. All participants gave written informed consent and were enrolled under protocols approved by the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster.
All 2D and 3D ultrasonography was performed by F.Y. who had 2 years of 2D and 3D ultrasound training experience. 2D ultrasonography included the measurement of standard fetal biometric parameters, BPD20, HC21, AC22 and FL23. TVol was acquired from a sagittal sweep that dynamically displayed both ends of the diaphysis by 3D ultrasonography. All 2D examinations and 3D volume acquisitions were performed on the same Voluson 730 Pro (GE Medical Systems, Milwaukee, WI, USA) ultrasound machine equipped with a 4–8-MHz transducer.
After volume acquisition, 3D multiplanar imaging was used to identify the midpoint of the thigh. The volume acquired was equally distributed around the middle 50% of the bone length, as measured from each diaphysis. Fractional limb volumes were measured offline using 4D View v.5.0 software (GE Medical Systems) (Figure 1) according to the method described by Lee10–13. Images were magnified to fill at least two-thirds of the display. TVol values were calculated automatically after manually tracing soft tissue borders of five slices from a transverse view of the femur.
Middle thigh circumference was also measured using 3D multiplanar analysis of the thigh volume. While FL was displayed in the A-plane, a corresponding axial view that allowed manual tracing of the middle thigh circumference was displayed in the B-plane (Figure 2).
To calculate the intra- and interobserver reliability of measurements on stored 3D volumes, 40 cases in the development group were selected using random numbers. The TVol and middle thigh circumference of these 40 cases were also measured by Y.Y. who had 5 years of 2D ultrasound experience and 1 year of 3D ultrasound experience. Each 3D parameter was measured twice by each observer to assess intraobserver reliability. We used the mean of two measurements by each observer to calculate the interobserver reliability.
Statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS 16.0, SPSS Inc., Chicago, IL, USA). Independent sample t-test and chi-square test were used to compare the demographic characteristics of two groups.
The intra- and interobserver reliability of TVol and middle thigh circumference measurements were evaluated for a single examiner and between two different examiners by intraclass correlation coefficients (ICCs) for 40 randomly selected cases. ICCs represent the proportion of total variance in measurements due to variation between subjects. An ICC of 1 indicates that all of the observed variance is due not to the variation between or within observers, but to variation between the subjects, whereas an ICC of 0 indicates that all of the observed variance can be attributed to variation between or within observers24. An ICC > 0.7 is commonly used to indicate sufficient reliability25–27. The intra- and interobserver reliabilities and limits of agreement (mean percent difference ± 2 SD) were calculated as described by Bland and Altman28.
We assessed the normality of distribution of all the variables, including birth weight, 2D and 3D parameters, using the Kolmogorov–Smirnov test. Systematic bias was determined by calculating the 95% CI for the mean difference (mean ± 2 SE). If 0 lies within this interval, no bias was assumed.
Multiple linear regression was used to develop several weight estimation functions that included either TVol alone or a combination of several sonographic parameters. The coefficient of determination (R2) was used to indicate the degree of birth-weight prediction variability that could be explained by each model. Statistical significance was set at an α level of P < 0.05.
For each model, the variance inflation factor (VIF) was examined to assess the severity of multicollinearity. Severe multicollinearity was typically defined by VIF > 1029.
Each model's predictive performance was assessed for systematic error and random error. Systematic error was evaluated by the mean percent error (MPE), where percent error = [(predicted birth weight—actual birth weight)/actual birth weight] × 100. Random error, or precision, was evaluated by the SD of signed percent errors. The systematic errors of each function were compared to zero using one-sample t-test. A multiple comparison test (least significant difference (LSD) t-test) was used for systematic error between models. The random errors were compared using the correlated variance test30. The relative effect of adding a soft tissue parameter (TVol) to this weight estimation procedure was evaluated. Linear regression methods were used to analyze the relationship of the MPE with actual birth weight.
New prediction models were prospectively compared with the formulae of Hadlock et al.31 and Woo et al.32. The Hadlock model has been widely used in clinical practice since 1985. Woo et al.'s model has been shown to give the least mean difference from actual birth weight for Hong Kong Chinese women with singleton pregnancy33. Model coefficients for the Hadlock formula were also derived from multiple regression analysis of our own data.
A prospective study of the new birth-weight prediction models was subsequently conducted with a validation group consisting of 190 late-third-trimester fetuses not previously used to derive these models. Prospective results were evaluated by MPE and SD. The systematic errors of each function were compared to zero using one-sample t-test. A multiple comparison test (LSD t-test) was used for MPE between models. Random errors were compared using the correlated variance test30. The percentage predictions within 5 and 10% were compared using the McNemar test with Bonferroni adjustment for multiple comparisons.
Of a total of 305 fetuses studied, 15 (4.9%) were excluded because thigh volume could not be measured because of shadowing in eight fetuses, movement in four fetuses and inadequate volume angle in three fetuses. Of the remaining 290 fetuses, 100 were used for development of prediction models and 190 for validation of prediction models. There were no differences in the demographic characteristics between these two groups of fetuses except the mean birth weight (Table 1). Because more macrosomic fetuses were recruited in the validation group, the mean birth weight of the fetuses in the validation group was greater than in the development group (P < 0.05).
|Development group||Validation group||P|
|Mean gestational age (weeks)||38.7 ± 3.1||38.9 ± 3.1||0.212*|
|Mean maternal age (years)||32.3 ± 4.1||32.6 ± 4.7||0.640*|
|Mean maternal body mass index (kg/m2)||21.8 ± 2.9||21.9 ± 3.3||0.656*|
|Mean birth weight (g)||3202 ± 360||3346 ± 432||0.003*|
|Interval between scan and delivery (days)||1.6 ± 1.3||1.3 ± 1.0||0.063*|
Of 100 fetuses examined at a mean gestational age of 38.7 weeks, 40 were studied for the intra- and interobserver reliability of measurements of thigh volume and middle thigh circumference. All of the ICCs were high (≥ 0.95) (Table 2). Bland–Altman plots displaying the intra- and interobserver differences are shown in Figure 3. The intra- and interobserver mean percent differences were all non-significant (Table 2).
|Mean % difference (95% CI)||Limits of agreement (Mean ± 2 SD)||ICC (95% CI)||P*|
|TVol (mL)||0.34 (−0.59 to 1.28)||− 5.52 to 6.30||0.990 (0.981–0.995)||0.462|
|Middle thigh circumference (cm)||0.40 (−0.33 to 1.13)||− 4.18 to 4.98||0.980 (0.961–0.989)||0.273|
|TVol (mL)||0.32 (−0.43 to 1.06)||− 4.32 to 4.96||0.989 (0.980–0.994)||0.391|
|Middle thigh circumference (cm)||0.42 (−0.09 to 0.93)||− 2.75 to 3.58||0.988 (0.978–0.994)||0.100|
|Observers 1 and 2|
|TVol (mL)||− 1.29 (−3.33 to 0.74)||− 13.99 to 11.41||0.962 (0.928–0.980)||0.205|
|Middle thigh circumference (cm)||− 0.83 (−1.98 to 0.31)||− 7.97 to 6.31||0.950 (0.905–0.973)||0.147|
Stepwise regression analysis led to birth-weight prediction Models 3 (including BPD, AC, FL and TVol) and 7 (BPD, AC, FL and middle thigh circumference) (Table 3). Simplified prediction models (Models 4, 5 and 6) were subsequently developed, particularly for situations in which BPD or AC measurement might be difficult. Selection of sonographic parameters was based on the assumption that TVol was superior to middle thigh circumference measurements. Model 8 was generated by Lee in a multiethnic population in 200913. Model 9 was a modified version of the Hadlock model derived from our own data. Model 10 was Woo's model based on a Hong Kong Chinese population32.
|Birth-weight prediction model||MPE (%)||SD (%)||R2|
|Model 1: AC and FL (Hadlock 198531)||− 2.93*||7.03||0.653|
|Log10M1 = 1.304 + 0.05281 × AC + 0.1938 × FL − 0.004 × AC × FL|
|Model 2: BPD, HC, AC and FL (Hadlock 198531)||− 3.53*||6.41||0.711|
|Log10M2 = 1.3596 − 0.00386 × AC × FL + 0.0064 × HC + 0.00061 × BPD × AC +|
|0.0424 × AC + 0.174 × FL|
|Model 3: BPD, AC, FL and TVol||0.23||4.68||0.825|
|M3 = − 2797.107 + 188.708 × BPD + 176.42 × FL + 13.906 × TVol +|
|57.152 × AC|
|Model 4: TVol, AC and FL||0.26||5.04||0.797|
|M4 = − 1692.564 + 15.992 × TVol + 197.714 × FL + 66.705 × AC|
|Model 5: TVol and AC|
|M5 = 18.440 × TVol + 68.117 × AC − 533.715||0.29||5.29||0.778|
|Model 6: TVol|
|M6 = 1205.016 + 24.981 × TVol||0.38||6.08||0.717|
|Model 7: BPD, AC, FL and thigh circ|
|M7 = − 5114.176 + 79.919 × AC + 317.733 × FL + 270.925 × BPD +||0.31||5.67||0.754|
|51.696 × thigh circ|
|Model 8: BPD, AC and TVol (Lee13)|
|lnM8 = − 0.8297 + 4.0344 (ln BPD) − 0.7820 (ln BPD)2 + 0.7853 (ln AC) +||− 3.29*||4.93||0.808|
|0.0528 (ln TVol)2|
|Model 9: BPD, HC, AC and FL (modified Hadlock)|
|Log10M9 = 2.293 + 0.030 × BPD + 0.004 × HC + 0.013 × AC + 0.050 × FL||− 3.71*||5.67||0.736|
|Model 10: BPD, AC and FL (Woo32)|
|Log10M10 = 1.13705 + 0.15549 × BPD + 0.04864 × AC − 2.79682 × 10−3 × BPD × AC +||− 4.95*||6.53||0.702|
|0.037769 × FL − 4.94529 × 10−4× FL × AC|
Among all birth-weight prediction models, the two highest R2 values were obtained from Models 3 (0.825) and 8 (0.808) (Table 3).
Models 3 and 4 had the lowest mean percent error, which were 0.23 and 0.26, respectively, and not significantly different from zero (t-test, P = 0.630; 0.609). The MPE of Models 5, 6 and 7 were also not different from zero (t-test, P = 0.584; 0.538; 0.586, respectively). The mean percent differences of the rest of the models were different from zero, with P-values all < 0.001.
The MPE and random errors of each model were compared with each other (Table S1). The MPE of Models 3, 4, 5, 6 and 7 were different from those of Models 1 and 2, the traditional Hadlock models31. They were also smaller than that of Model 8 (P < 0.001). The random errors of Models 3, 4 and 8 were smaller than those of Models 1 and 2 (P < 0.01). Table S1 also showed that the difference of random errors between Models 3, 4 and 8 were not significant.
Models 3 and 4 included both TVol and FL. Because the TVol measurement was derived in part from the femur length, we checked for the presence of multicollinearity by examining the VIF, which was 1.401. Because this value was close to 1, it could be concluded that the presence of multicollinearity was unlikely.
The systematic errors for Models 1, 2 and 8, as shown by MPE, were not significantly different from zero (t-test, P = 0.560; 0.138; 0.220, respectively) (Table 4). The mean percent errors for the rest of the models were different from zero, with P-values < 0.05 (Table 4). Models 3, 4, 5, 6 and 8 had a low random error (SD%) of < 6% compared with other models. These five models predicted 90% or more of birth weights to within 10% of actual values (Table 4). For the prediction of birth weight within 5% of actual values, Models 3, 4, 5, 6 and 8 predicted 69.5, 68.4, 66.8, 61.6 and 74.2%, respectively.
|Percentage prediction† within:|
|Birth-weight prediction models||MPE (%)||SD (%)||5%||10%|
|Model 1: AC and FL (Hadlock 198531)||− 0.36||8.40||38.9||76.3|
|Model 2: BPD, HC, AC and FL (Hadlock 198531)||− 0.83||7.69||46.3||82.6|
|Model 3: BPD, AC, FL and TVol||1.92*||4.75||69.5||95.3|
|Model 4: TVol, AC and FL||1.56*||4.92||68.4||94.7|
|Model 5: TVol and AC||1.83*||4.89||66.8||94.7|
|Model 6: TVol||0.95*||5.55||61.6||93.7|
|Model 7: BPD, AC, FL and middle thigh circumference||2.46*||6.75||51.6||84.2|
|Model 8: BPD, AC and TVol (Lee13)||− 0.43||4.84||74.2||95.8|
|Model 9: BPD, HC, AC and FL (modified Hadlock)||− 2.52*||6.89||46.8||83.7|
|Model 10: BPD, AC and FL (Woo32)||− 1.51*||7.81||51.1||81.1|
The systematic error and random error for each model in the validation group were compared (Table S2). The LSD t-test showed that the MPE values of Models 1 and 2 were smaller than those of Models 3 and 4 (P < 0.05). The random errors of Models 3, 4, 5, 6, 7, 8 and 9 were lower than those of the Hadlock models (Models 1 and 2) (P < 0.01).
The prediction percentages within 5 and 10% of actual birth weight in Models 3, 4, 5 and 8 were greater than those in Models 1 and 2 (Tables 4 and S3). The addition of BPD in Model 3 did not improve the prediction rates within ± 5% over those already obtained with the simpler Model 4 (69.5 vs. vs. 68.4%, P = 0.855, McNemar test with Bonferroni adjustment) (Tables 4 and S3). The prediction rate within 10% of accuracy was also not different between these two models (95.3 vs. 94.7%, P = 1.000) (Tables 4 and S3).
The prediction rates within 5 and 10% of actual birth weight of Model 2 were similar to that of Model 9 and Model 10 (46.3 vs. 46.8%, P = 1.000; 46.3 vs. 51.1%, P = 0.211; 82.6 vs. 83.7%, P = 0.804; 82.6 vs. 81.1%, P = 0.581, McNemar test with Bonferroni adjustment) (Tables 4 and S3).
The prediction rates within 5 and 10% of accuracy between Model 3 and Model 8 were not different (69.5 vs. 74.2%, P = 0.243; 95.3 vs. 95.8%, P = 1.000, McNemar test with Bonferroni adjustment). The corresponding prediction rates were higher than those from the traditional Hadlock Model 2 (74.2 vs. 46.3%, P < 0.001; 95.8 vs. 82.6%, P < 0.001 (Tables 4 and S3).
This study showed that for our Chinese population, Models 3 and 4, based on TVol and standard 2D biometry, predicted a higher proportion of birth weights to within 5% of actual birth weight when compared with the conventional Hadlock Models 1 and 2 which were based on standard 2D biometry alone (around 70 vs. 40%). The results are consistent with those of other studies13, 34; the birth-weight model based on BPD, TVol and AC was previously found to predict 57.3% of birth weights to within 5% of actual birth weight13, and birth weight and neonatal fat mass have been found to be more highly correlated with the sonographically measured thigh volume than with 2D parameters (BPD, HC, AC, FL)12, 34.
For the larger prospective cohort of 190 fetuses from the validation group (Table 4), the lowest systematic error was found using Model 8 (BPD, AC, TVol). This value (−0.43%) was not significantly different from zero compared with the systematic errors from Models 3 (1.92%) and 4 (1.56%), which were associated with a slight overestimation. Furthermore, Model 8 had a random error (4.84%) that was among the lowest values, similar to that from Models 3 and 4. In addition, the prediction rates within 5 and 10% of accuracy by Model 8 (74.2%, 95.8%) were similar to those from Model 3 (69.5%, 95.3%), but larger than those from the traditional Hadlock Model 2 (46.3%, 82.6%) (Table 4).
In Model 3, addition of BPD did not improve the prediction rate over what was already obtained with the simpler Model 4. Many studies have found that formulae that incorporate head measurements have a lower percentage of good predictions (i.e. within ± 5% or ± 10% of birth weight), despite the addition of another biometric parameter to the formula31, 35–38. At term, the fetal head is usually deep into the pelvis and its measurements cannot be taken accurately. However, some researchers have shown that birth-weight estimation improved when BPD is added13. Also some studies found that models based on AC, FL and HC were more accurate than those based on AC, FL and BPD at term1, 38. Whether BPD measurement can improve birth-weight model performance needs further investigation.
The prediction rates of birth weight using Model 9 (modified Hadlock model using our own data) and Model 10 (Woo's model from local population32) were similar to those of Model 2 (original Hadlock model). It seems that there is no additional benefit of modifying the formula of the original Hadlock model even though some studies have suggested that birth-weight prediction formulae should be customized to each specific population39, 40.
One study reported that neonatal mid-thigh circumference explained 21.8% of the variability in neonatal percent body fat in a Caucasian population41. A similar study also correlated sonographic parameters to newborn adiposity and found that mid-thigh circumference explained 43.9% of the variance in percent body fat12. Our results showed that the correlation with actual birth weight was weaker by fetal middle thigh circumference than by TVol. Although mid-thigh circumference can be used for weight estimation, more precise results were obtained in this Chinese population when the TVol parameter was used.
We also found that Model 8, the updated 3D model of Lee et al.13 including BPD, AC and Tvol, has very good performance with a low systematic error, low random error and a high percentage prediction rate within 5 and 10% of actual birth weight. Compared with the old model reported in 200110, the author converted parameters and their relationships to the natural logarithm of birth weight. The transformation increased linearity and reduced heteroscedasticity13. Further modification of our current 3D TVol models is planned in a future study.
The addition of TVol appears to improve the precision of fetal weight estimates, which may potentially predict subsequent development of adult diseases (e.g. hypertension, diabetes, coronary heart disease) based on the Barker42 hypothesis. This is the process by which fetal malnutrition leads to permanent changes in the body's structure and function in ways that lead to chronic disease in later life43. Fetal soft tissue assessment, either through the use of direct measurements or as part of the weight estimation procedure, may provide new insight into the early detection and monitoring of malnourished fetuses.
In clinical practice, if a prediction rate of 80% of birth weights to within 10% of actual birth weight is acceptable, use of standard biometric measurements including BPD, HC, FL and AC will be sufficient. If a prediction rate of 70% of birth weights to within 5% of actual birth weight is the target, we recommend using Model 4 which is based on TVol, AC and FL. If AC is affected by a disorder like diaphragmatic hernia or abdominal wall defect, we recommend Model 6 which is based on TVol alone. However, measurement of TVol may not be possible in ∼ 5% of cases, and offline measurement is currently required, with each measurement taking ∼ 1–2 minutes.
The strengths of this study are, first, it was a prospective study including 100 cases to establish various models, and another 190 cases to test the models. Second, all the fetuses were delivered within 5 days. The median scan to delivery interval was 1 day, resulting in a more accurate estimated fetal weight44. Limitations of our study are, first, only 21 macrosomic fetuses were involved in the validation of the models. We did not perform statistical analysis in the group of macrosomic fetuses due to limited sample size. Further study on the prediction of macrosomia is ongoing. Second, our models only examined term fetuses. Whether these models are reliable in preterm birth weight prediction is unknown.
In conclusion, the precision of birth-weight prediction to within 5 and 10% of actual birth weight in a Chinese population at term gestation can be improved by adding 3D thigh volume to conventional 2D fetal biometric measurements.
SUPPORTING INFORMATION ON THE INTERNET
The following supporting information may be found in the online version of this article.
Table S1 Comparison of models for mean percentage error and for random error in development group at term
Table S2 Comparison of models for mean percentage error and for random error in validation group at term
Table S3 Comparison of prediction accuracy within 5% and within 10% of actual birth weight in validation group at term