Correspondence to: Dr G. Lindell, Department of Obstetrics and Gynecology, Skåne University Hospital Lund, S-22185 Lund, Sweden (e-mail: gun.lindell@med.lu.se)

ABSTRACT

Objectives

To evaluate the prediction of large-for-gestational age (LGA) term neonates using the routine third-trimester ultrasound examination and to investigate whether the prediction could be further improved by adding information on maternal characteristics.

Methods

Information on 56 792 singleton term pregnancies with a routine ultrasound examination at 32–34 weeks' gestation was retrieved from a population-based perinatal register. Estimated fetal weights (FW) were expressed as gestational age-specific standard deviation scores (Z-scores). The prediction of LGA was assessed by receiver–operating characteristics (ROC) curves, with LGA defined as birth weight Z-score > + 2. The data set with complete clinical information (n = 48 809) was divided into a development and a validation set. Using the development set, multiple logistic regression analysis was performed to identify maternal characteristics associated with LGA. The odds ratios obtained were converted into likelihood ratios. These were then applied to the validation set and the probability for LGA for each infant was estimated using the Bayesian theorem.

Results

The FW Z-score showed a high predictive ability for LGA (area under the ROC curve (AUC) 0.89 (95% CI, 0.89–0.90)). Prediction was further improved by using the model that included both FW Z-scores and maternal variables (AUC 0.91 (95% CI, 0.90–0.92)) (P for difference < 10^{–6}). The corresponding AUC for a model including maternal characteristics only was 0.74 (95% CI, 0.73–0.76).

The proportion of newborns with birth weight > 4000 g has increased during the past two decades, in parallel with an increasing prevalence of maternal body mass index (BMI) ≥ 25 kg/m^{2} and with an increase in maternal age[1, 2]. The fetal genome is the central controller of growth in an uncomplicated pregnancy[3]. However, it is well known that maternal clinical characteristics and fetal gender are associated with fetal growth[2, 4-6].

Giving birth to a macrosomic neonate (birth weight ≥ 4500 g) is strongly associated with increased perinatal morbidity and mortality for both mother and baby[7-11]. Both transient and permanent fetal and maternal injuries are seen as a consequence of delivering a large fetus, and for the neonate it might result in impairment to health later in life[10]. Antenatal detection of large fetuses makes it possible to intervene by induction of labor or Cesarean section, thereby preventing the birth of macrosomic newborns, or minimizing the risks associated with vaginal birth complicated by fetopelvic disproportion.

Pregnant women in the southern region of Sweden are offered a routine ultrasound examination, including dating of pregnancy, in the first half of gestation, and a second routine ultrasound examination in the third trimester of pregnancy for fetal growth control[12]. The fetal weight (FW) estimation formula of Persson and Weldner[13] is used, based on measurements of fetal head, abdomen and femur length.

The antenatal detection of a fetus at risk for macrosomia represents a clinical challenge, as it has been reported that the ultrasound formulae available tend to underestimate the weight of large fetuses[14, 15]. The aims of this study were to investigate the accuracy of the detection of large-for-gestational age (LGA) term fetuses at various Z-score cut-offs for FW as estimated at a routine fetal ultrasound examination at 32–34 weeks' gestation, and to investigate whether the prediction of LGA fetuses could be further improved by also including maternal characteristics using a Bayesian model.

METHODS

Population-based information was retrieved from the regional perinatal database, Perinatal Revision South (PRS), that was established in 1995 for quality assurance of perinatal care in the southern region of Sweden[16]. The PRS database contains obstetric and neonatal data from all delivery and neonatal units in the region (approximately 17 000 deliveries annually). The study population from the database included women who lived within the catchment areas of the hospitals of Lund, Malmö and Trelleborg, and who gave birth to a term singleton infant between 1995 and 2009. Inclusion criteria were singleton pregnancy, estimated date of delivery determined by fetal ultrasound measurement in the second trimester (at 17–19 postmenstrual weeks)[17], a second routine ultrasound examination for evaluation of fetal growth performed at 32–34 completed weeks of pregnancy[13] and delivery at ≥ 37 weeks' gestation (total data set, n = 56 792). The collection process and the data set are described in detail elsewhere[6]. The dataset, comprising women with information available regarding maternal BMI, smoking habits, age, parity, height and diabetes status (n = 48 809), was divided into two parts: a development sample (women born on uneven dates), and a validation sample (women born on even dates).

The ultrasound-estimated FW and birth weight were expressed in standard deviation scores (Z-scores) above or below the expected weight for gestational age according to the Swedish standard for intrauterine growth[5], which is used clinically for evaluation of both FW and birth weight. Infants with a birth weight Z-score of > + 2 were considered LGA. With the definitions used in the current study, the cut-off limit for LGA at 40 + 0 weeks' gestation was 4433 g for boys and 4296 g for girls.

Statistical analysis

For the total data set, a receiver–operating characteristics (ROC) curve was created to illustrate the sensitivity and specificity for LGA detection by different FW Z-score cut-offs. For each cut-off step, corresponding to 0.5 FW Z-score, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and Youden index (= sensitivity + specificity − 1; equivalent to the height over the line of unity) were estimated. For continuous variables, the Mann–Whitney U-test was used to compare demographic characteristics between subjects in the development sample and those in the validation sample, and for categorical variables the chi-square test was used.

For the subset with clinical data available, associations between LGA and maternal age (continuous variable), parity (class variables), BMI (continuous), height (continuous), smoking (semi-continuous; 1 = non-smoker, 2 = smoker (1–9 cigarettes per day), 3 = smoker (≥ 10 cigarettes per day)), diabetes mellitus (yes/no), gestational diabetes (yes/no) and FW Z-score (continuous), were first evaluated using univariate logistic regression analysis. For the continuous variables (maternal age and BMI), second-degree models were also tested, but as they did not improve the goodness of fit they were not considered further. Variables with P < 0.2 were entered into a multiple logistic regression analysis, and variables with P < 0.2 in the primary multiple model were entered into the final multiple logistic regression analysis. For comparison, one extra analysis was performed, which included maternal characteristics but not FW Z-score. Statistical analysis was performed using Gauss software (Gauss^{TM}, Aptech Systems Inc., Maple Valley, WA, USA, http://www.aptech.com).

The outputs of the final fitted multiple logistic regression model were used as previously described in detail by Smith et al.[18] in their prediction model for Cesarean section risk. Thus, the outputs were converted into likelihood ratios (LRs). The method can briefly be summarized in the following steps: (1) an optimal replacement constant was estimated for the included independent variables in the logistic regression model to be used when information on the variable was lacking (× 1, × 2 etc. is lacking); (2) The LRs were adjusted by assessing the difference between the replacement constant and the actual ‘overall log odds’ for the outcome in the development sample; (3) The above process was repeated for each independent variable included and the output was used to calculate an adjusted LR for LGA for every woman in the development sample. The predictive ability for LGA of the final model was validated in the clinical data of the validation sample by calculating the area under the ROC curve (AUC) with 95% CI, when the adjusted LRs for LGA were applied.

For comparison, also based on the validation sample, the AUC was computed, using the LR estimates obtained from a logistic regression analysis in which the maternal characteristics were not included (crude model). The AUCs (final model vs crude model) were compared using the method suggested by DeLong et al.[19]. Finally, the numbers of observed and predicted cases of LGA in the validation sample were calculated for various ‘predicted risk strata’. The predicted risk for each woman in the validation sample was calculated using the Bayesian theorem:

predicted probability = posterior odds / (1 + posterior odds);

posterior odds = background odds × individual LR,

with the individual LRs derived by applying the LRs obtained from the multiple logistic regression analysis on each record in the validation sample.

RESULTS

Figure 1 shows the ROC curve based on the total data set, describing the overall capability of the FW Z-score estimated from routine ultrasound examination at 32–34 weeks' gestation to predict LGA neonates at term; the AUC was 0.89 (95% CI, 0.89–0.90). The median and mean time between the ultrasound examination and delivery were 7.1 (range 2.1–11.4) weeks and 7.1 ± 1.4 weeks, respectively. Table 1 shows, for each FW Z-score cut-off value, the corresponding numbers, sensitivity, specificity, PPV and NPV. At the cut-off value +2 FW Z-score, the sensitivity was 27%, false-positive rate (1 – specificity) was 1.4% and PPV was 50%. The corresponding numbers for an FW Z-score cut-off of +1.5 were 47%, 4% and 35%, respectively, and at the cut-off value of FW Z-score +1.0, sensitivity was 72%, false-positive rate 13% and PPV 21%. The ‘optimal cut-off value’ (the estimate with the largest height over the line of unity, sensitivity = (1 – specificity)) was +0.5 FW Z-score (sensitivity 88%, false-positive rate 27% and PPV 14%).

Table 1. Numbers of large-for-gestational age (LGA) and non-LGA infants according to fetal weight Z-score (FW Z-score) cut-off step at routine ultrasound examination at 32–34 weeks' gestation in a large cohort of women showing sensitivity, 1 − specificity, positive predictive value (PPV), negative predictive value (NPV) and Youden index* for LGA at birth for each step

Area under receiver–operating characteristics curve = 0.89 (95% CI, 0.89–0.90) (see Figure 1).

^{a}

Youden index = sensitivity + specificity − 1 (equivalent to magnitude of height over line of unity). FPR, false-positive rate.

3.0

142

2510

34

54 105

0.054

0.001

0.807

0.956

0.053

2.5

306

2346

157

53 982

0.115

0.003

0.661

0.958

0.112

2.0

720

1932

734

53 405

0.271

0.014

0.495

0.965

0.258

1.5

1257

1395

2379

51 760

0.474

0.044

0.346

0.974

0.430

1.0

1919

733

7131

47 008

0.724

0.132

0.212

0.985

0.592

0.5

2341

311

14 731

39 408

0.883

0.272

0.137

0.992

0.611

0.0

2577

75

27 576

26 563

0.972

0.509

0.085

0.997

0.462

−0.5

2641

11

39 073

15 066

0.996

0.722

0.063

0.999

0.274

−1.0

2649

3

48 045

6094

0.999

0.887

0.052

1.000

0.111

−1.5

2652

0

52 014

2125

1.000

0.961

0.049

1.000

0.039

−2.0

2652

0

53 684

455

1.000

0.992

0.047

1.000

0.008

−2.5

2652

0

54 044

95

1.000

0.998

0.047

1.000

0.002

−3.0

2652

0

54 124

15

1.000

1.000

0.047

1.000

0.000

The maternal, fetal and neonatal characteristics of the development and validation groups are shown in Table 2. The demographic characteristics of the groups were similar, but the proportion of women with BMI ≥ 30 kg/m^{2} was significantly higher in the development group than in the validation group.

Table 2. Demographic characteristics of development sample and validation sample groups in a large cohort of women

Characteristic

Development sample

(n = 25 261)

Validation sample

(n = 23 548)

P

Both groups contain only records with all information available. Data shown as n (%) or mean ± SD.

^{a}

Chi-square test.

^{b}

Mann–Whitney U-test.

^{c}

FW Z-scores and BW Z-scores based on Swedish standard for intrauterine growth^{5}. BMI, body mass index; BW, birth weight; FW, fetal weight; LGA, large-for-gestational age; SGA, small-for-gestational age.

Table 3 shows the odds ratios for LGA obtained from univariate and multiple logistic regression analyses based on the development sample. In the univariate analysis, all evaluated factors except parity 3+ were significantly associated with LGA. In the first multiple model (including all significant variables), all variables except maternal age and gestational diabetes remained significant. In the final multiple model (including variables with P < 0.20 in the primary multiple model), maternal age was not included. The factor most strongly associated with LGA was FW Z-score, accounting for 35% of the variance in the univariate setting (R^{2} = 0.35).

Table 3. Risk factors for large-for-gestational-age term infants in development sample of a large cohort of women, using univariate and multiple logistic regression analysis

FW Z-score at ultrasoundd (per 1-FW Z-score unit increase)

6.905

< 10^{–6}

6.475

< 10^{–6}

6.450 (5.893–7.061)

< 10^{–6}

As described in the Methods section, the results from the final multiple logistic regression analysis were used to calculate LRs (Table 4); an example of how the LRs can be used to estimate the individual risk for having an LGA infant is shown in Appendix 1.

Table 4. Adjusted likelihood ratios (LR) for maternal characteristics and estimated fetal weight (FW) Z-score, derived from logistic regression analysis of development sample

Height

BMI

FW Z-score

cm

LR

kg/m^{2}

LR

LR

Z-score

LR

BMI, body mass index.

Smoker

150

0.3

15

0.6

No

1.1

−2.00

0.0058

151

0.3

16

0.6

< 10 cigarettes per day

0.7

−1.75

0.0092

152

0.3

17

0.6

≥ 10 cigarettes per day

0.4

−1.50

0.015

153

0.4

18

0.7

−1.25

0.023

154

0.4

19

0.7

Parity

−1.00

−1.00

155

0.4

20

0.7

Nulliparous

0.6

−0.75

0.059

156

0.4

21

0.8

Parous

1.4

−0.50

0.094

157

0.5

22

0.8

−0.25

0.15

158

0.5

23

0.9

Pre-existing diabetes

0.00

0.20

159

0.6

24

0.9

Yes

3.4

0.25

0.40

160

0.6

25

1.0

No

1.0

0.50

0.60

161

0.6

26

1.0

0.75

1.0

162

0.7

27

1.1

Gestational diabetes

1.00

1.5

163

0.7

28

1.2

Yes

1.4

1.25

2.5

164

0.8

29

1.2

No

1.0

1.50

3.9

165

0.8

30

1.3

1.75

6.2

166

0.9

31

1.4

2.00

10.0

167

1.0

32

1.4

2.25

15.9

168

1.0

33

1.5

2.50

25.3

169

1.1

34

1.6

2.75

40.3

170

1.2

35

1.7

3.00

64.2

171

1.3

36

1.8

3.25

102.3

172

1.4

37

1.9

3.50

163.1

173

1.5

38

2.0

174

1.6

39

2.2

175

1.7

40

2.3

176

1.8

177

1.9

178

2.1

179

2.2

180

2.4

181

2.6

182

2.7

183

2.9

184

3.2

185

3.4

For each woman in the validation sample, the probability of having an LGA infant was predicted by applying the LRs shown in Table 4 to her personal characteristics and clinical data. Table 5 shows the numbers of observed and predicted LGA cases, respectively, by predicted probability strata. For women with a predicted probability of more than 90% to have an LGA infant, the numbers of observed and predicted cases of LGA were similar, at 97 and 95%, respectively. In the prediction stratum 50–59%, the numbers of predicted and observed cases of LGA were identical (55%), whereas in the strata between 60 and 89%, the numbers of predicted LGA cases were higher than those observed.

Table 5. Observed and predicted numbers of large-for-gestational-age (LGA) infants in validation sample of a large cohort of women in relation to predicted probability using Bayesian model

Derived from likelihood ratios (LR) of development sample, using Bayesian theorem. Predicted probability = posterior odds/(1 + posterior odds). Posterior odds = background odds × individual LR. Background odds = 0.0472. Individual LR obtained by applying the LRs displayed in Table 4 on each record in the validation sample.

0–9

< 2.4

20 975

359 (1.7)

345 (1.6)

10–19

2.4–5.2

1255

183 (14.6)

178 (14.2)

20–29

5.3–9.0

491

127 (25.9)

120 (24.4)

30–39

9.1–14.0

295

99 (33.6)

102 (34.6)

40–49

14.1–21.1

153

58 (37.9)

68 (44.4)

50–59

21.2–31.7

148

81 (54.7)

81 (54.7)

60–69

31.8–49.3

97

55 (56.7)

63 (64.9)

70–79

49.4–84.6

49

29 (59.2)

37 (75.5)

80–89

84.7–190.5

47

34 (72.3)

40 (85.1)

90–99

≥ 190.6

38

37 (97.4)

36 (94.7)

ROC curves were created to describe the overall ability of the LRs to predict LGA in the validation group (Figure 2). Using the LRs retrieved from the final multiple model to predict LGA, the AUC was 0.91 (95% CI, 0.90–0.92), which was significantly larger than the AUC retrieved from a model including only FW Z-score (0.89 (95% CI, 0.88–0.90)) (P for difference between areas < 10^{–6}). The AUC retrieved from a model including maternal characteristics only (model details not shown) was considerably smaller (0.74 (95% CI, 0.73–0.76)) than that for the models with FW Z-score included (P < 10^{–6}) (Figure 2).

DISCUSSION

In this population-based study, including 56 792 singleton pregnancies with routine third-trimester ultrasound FW estimation, the ROC curve obtained using FW Z-scores indicated a high accuracy in the detection of LGA neonates (AUC, 0.89). At the theoretical ‘optimal cut-off value’ (+0.5 FW Z-score), the sensitivity and false-positive rate were satisfactory (88 and 27%, respectively), but the corresponding low PPV (14%) indicates that for clinical practice a higher cut-off-value (e.g., +1.0 or +1.5 FW Z-score) might be a better choice. The decision regarding the cut-off to be used will be guided by the clinical protocol for the risk group and resources available. When maternal characteristics were added to the estimated FW Z-score for calculating the probability of an LGA term neonate, the AUC for a new ROC curve was slightly, but significantly, increased (P < 10^{–6}) compared to the AUC for the ROC curve constructed using the ultrasonically estimated FW Z-score alone.

Several studies have been performed aiming to validate the accuracy of birth-weight prediction of macrosomic fetuses[14, 15, 20-22]. In those studies, most of the FW estimates were performed close to birth. In the current study, the aim was not to validate the accuracy of birth-weight prediction, but to investigate the possibility of predicting LGA term neonates by using a routine ultrasound FW estimation at 32–34 weeks' gestation. Not many studies have been published that calculated the individual risk for LGA at birth from data obtained on routine ultrasound examination. The existing studies are all small, and their purpose was not identical to the aim of this study. Mazouni et al.[23] published a nomogram for individual prediction of macrosomia (birth weight > 4000 g) based on maternal characteristics and the presence or absence of an estimated FW ≥ 4000 g at ultrasound examination performed within 1 week of delivery. The accuracy of predicting fetal macrosomia was compared between the nomogram and sonographically estimated FW[24], the nomogram being found to have a significantly better accuracy (AUC 0.85 and 0.74 for the nomogram and ultrasound FW estimation, respectively). Thus, the results of the study of Mazouni et al.[23] are in accord with those of the current study, although their results yielded considerably lower overall prediction for LGA than the results reported here. One should, however, keep in mind that the definition of LGA differed considerably between these two studies. To use 4000 g as the cut-off for macrosomia would not be appropriate for the Swedish population, as, at 40 + 0 weeks, 4000 g among boys represents a Z-score of +0.92. This means that with a cut-off of 4000 g, 18% of all Swedish boys born at 40 weeks or more would fall above the limit. The use of Z-scores makes the results generally applicable, as they can be applied to any population for which a growth curve is available. Furthermore, the +2 Z-score cut-off used in the current study is quite similar to the 97^{th} percentile, which is quite commonly used to define excessive growth[25].

Nahum and Stanislaw[26] investigated the possibility of improving the prediction of fetal macrosomia (birth weight > 4000 g) up to 11 weeks before delivery by combining fetal ultrasound biometry with maternal and pregnancy-specific information. Their formula was found to be superior to the four Hadlock formulae for estimated FW. In contrast to the results of Nahum and Stanislaw, Ben-Haroush et al.[27] did not find any evidence that the prediction of LGA at birth could be improved by adding clinical information to the ultrasonically estimated FW.

Balsyte et al.[28] compared the predictive value for the detection of fetal macrosomia using the formulae of Hadlock et al.[24] and Nahum and Stanislaw[26] and the nomogram of Mazouni et al.[23]. The prediction of macrosomia using ultrasound alone was significantly superior to the combined method of Mazouni et al., whereas the prediction using Nahum and Stanislaw's equation was similar to that of ultrasound alone. However, it should be noted that, with the short time elapsed from the ultrasound examination to birth in the study of Balsyte et al. (< 1 week), no strong influence of maternal factors could be expected. The longer the period between ultrasound examination and delivery, the more time there is for maternal factors to influence fetal growth. In the present study, the ultrasound examination-to-delivery interval was on average 7 weeks.

The strengths of the present study are the early identification of the group at risk for LGA, the large size of the dataset, which comprised an unselected population with deliveries at ≥ 37 weeks, and the efficient methodology used to develop a prediction model for LGA based on the Bayesian theorem. A possible weakness of the study is the lack of information regarding parental ethnicity. It is possible that the prediction model would have been even more powerful if maternal and paternal birth weight, the birth weight of previous siblings and information on paternal characteristics had been considered.

In light of the worldwide obesity epidemic, perinatal complications caused by fetal macrosomia will be an increasing phenomenon in the near future. Early identification of fetuses at risk for macrosomia will therefore be an issue of increasing importance in obstetrics. The prediction model developed in the current study might be used with high accuracy for the identification of women at increased risk of giving birth to a macrosomic infant. These women could then be offered extra ultrasound examinations enabling fetal growth follow-up, possibly employing more sophisticated techniques such as three-dimensional ultrasound, and individual planning for time and mode of delivery. Such a clinical protocol might prevent perinatal complications due to fetal macrosomia, and thus benefit both the mother and the neonate.

APPENDIX 1

Individual calculation of probability of risk for a large-for-gestational-age (LGA) neonate using the prediction model.

General model

Posterior odds = background odds × individual LR = 0.0472 × (LR-BMI × LR-height × LR-smoking habits × LR-parity × LR-DM/GDM × LR-FW Z-score), where LR is likelihood ratio, BMI is body mass index, DM is diabetes mellitus, GDM is gestational diabetes and FW Z-score is Z-score for fetal weight estimated by ultrasound at 32–34 weeks' gestation.

By applying the LR for each maternal characteristic in Table 4, the posterior odds can be calculated.

Predicted probability = posterior odds/(1 + posterior odds).

Case example

Calculation of the predicted probability of giving birth to an LGA neonate for a pregnant woman with BMI 30.0 kg/m^{2}, height 170.0 cm, smoking habit of ≥ 10 cigarettes/day, nulliparous, no DM/GDM and FW Z-score of 1.75 at third-trimester ultrasound examination: