Use of Z-scores to select a fetal biometric reference curve

Authors

N. Sananes,

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Department of Ultrasound and Fetal Medicine, 19 Rue Louis Pasteur, Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg, Strasbourg, France

Fetal biometric data are a major part of prenatal ultrasound screening in the general population. The aim of this study was to analyze the effect of choice of reference curve on the quality of screening for growth abnormalities, using a statistical tool based on Z-scores.

Methods

The biparietal diameter (BPD), head circumference (HC), abdominal circumference (AC) and femur length (FL) were measured in 9699 ultrasound scans during the second trimester (20–24 weeks of gestation) and 8100 scans during the third trimester (30–34 weeks of gestation). These biometric data were all transformed retrospectively into Z-scores, calculated using five reference curves: those published by Snijders and Nicolaides (1994), Chitty et al. (1994), Kurmanavicius et al. (1999) and Salomon et al. (2006), and curves used at our ultrasound unit generated from a sample of the local population. The Z-score distribution was compared with the expected normal distribution by calculation of the mean and SD, and using the Kolmogorov–Smirnov test. The sensitivity and specificity of each reference curve were calculated to determine the capacity of these curves to identify fetuses with measurements < 5^{th} percentile or > 95^{th} percentile for each parameter.

Results

Most of the distribution curves determined from the Z-scores of the measurements taken differed significantly from a non-skewed standard normal curve (mean of 0 and SD of 1). In our population, the Chitty reference curves gave the best results for identifying fetuses with abnormal (< 5^{th} percentile or > 95^{th} percentile) BPD (sensitivity, 100%; specificity, 97.24%), HC (sensitivity, 96.07%; specificity, 98.89%) and FL (sensitivity, 96.46%; specificity, 98.80%). The best reference for AC was the Salomon curve (sensitivity, 72.25%; specificity, 99.64%).

The collection of fetal biometric data represents an important part of ultrasound screening in the general population. Biometric measurements are compared with reference curves, and are considered to be normal, too low (below the 5^{th} percentile) or too high (above the 95^{th} percentile). They can thus be used as a screening test to identify fetuses at high risk of abnormalities in growth, morphological features or karyotype.

Several different approaches to screening using fetal biometric data have been described: the measured values can be evaluated in isolation in comparison with a customized reference curve1, 2, they can be integrated into a formula (e.g. for estimation of fetal weight3), multiple parameters can be assessed in comparison with one another (proportionality index)4–6, or parameters can be measured and assessed longitudinally (speed or rate of growth)7, 8. However, the use of cross-sectional growth curves remains the most widely used method of screening for fetal growth abnormalities.

The reference curve that is used must be appropriate for the population studied. Salomon et al. developed a method based on Z-scores for evaluation of the effect of the reference curve chosen on the quality of screening for growth abnormalities9, 10. The advantage of the Z-score is that it integrates the measurement itself, the mean and the SD into a single value. The World Health Organization recommends the expression of measurements as Z-scores in order to allow relevant statistical analyses11.

The aim of this study was to analyze the quality of screening for growth abnormalities by transforming a series of fetal biometric measurements into Z-scores using different reference equations. This made it possible to evaluate the effects of the choice of reference curve on the interpretation of biometric data and to identify the most appropriate curve or curves.

METHODS

This study included patients attending the Department of Ultrasound and Fetal Medicine at Centre Médico-Chirurgical et Obstétrical—Syndicat Inter-Hospitalier de la Communauté Urbaine de Strasbourg (CMCO-SIHCUS) for an ultrasound scan during the second (20–24 weeks of gestation) or third (30–34 weeks of gestation) trimester, between January 1998 and December 2007. Exclusion criteria were: abnormal karyotype or congenital malformation, pregnancy loss (fetal death), and an absence of pregnancy dating based on measurement of crown–rump length during the first trimester12.

All biometric measurements were taken to the nearest millimeter, without time constraints, by three ultrasound specialists and seven midwives qualified to national inter-university diploma level. These measurements were carried out in accordance with the methods described in the studies corresponding to the reference curves13–20. Biparietal diameter (BPD) and head circumference (HC) were measured on transverse sections of the fetal head, in a plane in which the cavity defined by the septum pellucidum crossed the median line in its anterior third. BPD was measured using external landmarks, whereas HC was calculated from the ellipse defined by the occipitofrontal diameter and BPD. Abdominal circumference (AC) was measured on circular transverse sections of the fetal abdomen, above the site of insertion of the umbilical cord, using the formula for an ellipse. Femur length (FL) was measured on sections showing the whole diaphysis, in which the two extremities were clearly identifiable and perpendicular to the diaphysis. At 30–34 weeks of gestation, the epiphyses were not included in the measurement.

The BPD, HC, AC and FL measurements from 9699 ultrasound scans during the second trimester and 8100 scans during the third trimester were retained. All the biometric data obtained were then transformed into Z-scores, calculated using five series of reference equations, taken from Snijders and Nicolaides13, Chitty et al.14–17, Kurmanavicius et al.18, 19, Salomon et al.20 and curves routinely used in our ultrasound unit (CMCO) generated from a sample of the local population. The equations used are shown in Table S1. In all cases, Z-scores according to gestational age were calculated using the following formula: Z-score = (biometric measurement − expected biometric mean)/SD. Z-scores should follow a non-skewed standard normal distribution (Gaussian distribution with a mean of 0 and SD 1) if the measurements taken are consistent with reference equations used to calculate them. By definition, in a standard normal distribution, the −1 SD to +1 SD interval includes 68% of the population and the −2 SD to +2 SD interval includes 95% of the population, with the 5^{th} percentile corresponding to −1.645 SD and the 95^{th} percentile corresponding to +1.645 SD.

Each Z-score distribution was compared to the standard normal distribution using the Kolmogorov–Smirnov test21, 22.

The mean Z-score should be close to 0, and Student's t-test was used to compare the mean obtained with the theoretically expected value of 0. Finally, the SD of each Z-score distribution was compared with the expected value of 1, using a test based on the Chi-square distribution.

The 5^{th} and 95^{th} percentiles of the Z-score distributions for a biometric parameter measured in our population should match the 5^{th} and 95^{th} percentiles of the reference equation used to calculate them, i.e. −1.645 and +1.645, and so the observed 5^{th} and 95^{th} percentiles of the Z-scores were calculated for each biometric parameter using each of the reference equations. The fetuses that would have been considered abnormal (Z-score < −1.645 and > 1.645) using each reference equation were identified for each parameter, and the sensitivity and specificity of each equation for identifying fetuses truly < 5^{th} or > 95^{th} percentile (based on the observed distribution of Z-scores) were calculated (contingency tables provided online; Tables S2 and S3).

The results were finally expressed in the form of a graphical representation, allowing the effect of the choice of reference curve on the quality of screening for growth abnormalities to be visualized. To this end, the histogram of Z-score distributions was superimposed on the non-skewed standard normal curve for each biometric measurement and each reference equation. The difference between the expected distribution and the observed distribution could then be observed.

RESULTS

Most of the Z-score distribution curves of the measurements obtained appeared to be normal, but few matched the expected standard normal distribution. For measurements obtained from second-trimester scans, the Z-values calculated with the Kolmogorov–Smirnov test ranged between 1.350 and 3.129. Only the Z-score distribution calculated with the Chitty equation for BPD did not significantly differ from the standard normal distribution (Z = 1.350, P = 0.052).

For third-trimester measurements, the Z-values calculated with the Kolmogorov–Smirnov test were between 1.003 and 2.943. Only the Z-score distributions for the HC measurements did not significantly differ from the standard normal curve, regardless of the reference equation used (Snijders: Z = 1.003, P = 0.267; Chitty: Z = 1.046, P = 0.224; Kurmanavicius: Z = 1.057, P = 0.213; Salomon: Z = 1.046, P = 0.224; CMCO: Z = 1.055, P = 0.216).

For measurements obtained during the second trimester, Z-score mean values were between −0.506 and 0.738 and all means were significantly different from zero. During the third trimester, Z-score means were between −1.141 and 0.656. All means were significantly different from zero, with the exception of the mean of Z-scores obtained for FL measurements using the Kurmanavicius equation (mean, 0.003; t = −0.275, P = 0.783). The SDs for the second- and third-trimester measurements were between 0.663 and 1.155, and between 0.877 and 1.256, respectively. All were significantly different from 1. The results are shown in full in Tables S4 and S5.

For values obtained during the second trimester, the observed 5^{th} percentiles of the Z-score distributions were between −2.221 and −0.511. The total number of fetuses considered to have abnormally low Z-scores, with respect to the expected 5^{th} percentile (−1.645), was between 22 and 1312. Between four and 827 fetuses were considered wrongly classified for each parameter using each reference equation. The sensitivity of screening varied between 4.54% and 100%, and the specificity varied between 91.02% and 100% (Table S6).

The Z-score distributions obtained for second-trimester measurements had observed 95^{th} percentile values between 1.182 and 2.525. The total number of fetuses considered to have abnormally high Z-scores, using the expected 95^{th} percentile (+1.645) as the cut-off, was between 198 and 1944. Between 14 and 1459 fetuses were wrongly classified for each parameter using each reference equation. The sensitivity of screening varied between 40.82% and 100%, and the specificity varied between 84.17% and 100% (Table S7).

During the third trimester, the Z-score distributions had observed 5^{th} percentile values between −2.284 and −1.054. The total number of fetuses considered to have abnormally low Z-score values, taking the expected 5^{th} percentile (−1.645) as the cut-off point, was between 115 and 1123. Between six and 718 fetuses were considered wrongly classified for each parameter using each reference equation. The sensitivity of screening varied between 28.40% and 100%, and the specificity between 90.67% and 100% (Table S8).

The Z-score distributions obtained for third-trimester measurements had observed 95^{th} percentile values between 1.331 and 2.425. The total number of fetuses considered to have abnormally high Z-scores, with respect to the expected 95^{th} percentile (+1.645), was between 234 and 1465. Between two and 1060 fetuses were considered wrongly classified for each parameter using each reference equation. The sensitivity of screening varied between 57.78% and 100%, and the specificity between 86.22% and 100% (Table S9).

The overall results for classification of the fetuses using the 5^{th} and 95^{th} percentiles from each of the five reference curves for each parameter, using measurements obtained during both the second- and third-trimesters, are shown in Table 1. The number of fetuses wrongly classified varied between 430 and 2690, depending on the reference equation used. For our population, the reference equations of Chitty et al.14, 16 gave the best results for the evaluation of BPD (932 fetuses wrongly classified; sensitivity, 100%; specificity, 97.24%; Youden index, 0.97; positive predictive value (PPV), 65.63%; negative predictive value (NPV), 100%), HC (447 fetuses wrongly classified; sensitivity, 96.07%; specificity, 98.89%; Youden index, 0.95; PPV, 81.94%; NPV, 99.79%) and FL (469 fetuses wrongly classified; sensitivity, 96.46%; specificity, 98.80%; Youden index, 0.95; PPV, 80.88%; NPV, 99.81%). The best reference equation for the evaluation of AC was that of Salomon et al.20 (615 fetuses wrongly classified; sensitivity, 72.25%; specificity, 99.64%; Youden index, 0.72; PPV, 91.40%; NPV, 98.56%).

Table 1. Overall results for screening fetuses (n = 17 799) during the second and third trimesters for measurements < 5^{th} percentile and > 95^{th} percentile for each parameter using each of the reference equations

Reference

True positive

False positive

True negative

False negative

Wrongly classed (n)

Sensitivity (%)

Specificity (%)

Youden index

PPV (%)

NPV (%)

In our population, the Chitty reference curves gave the best results for identifying fetuses with abnormal (< 5^{th} percentile or > 95^{th} percentile) BPD, HC and FL, whereas the best reference for AC was the Salomon curve; the best fitting curve gave the best sensitivity, an acceptable level of specificity and the best Youden index.

*

Equations used in our ultrasound unit (Centre Médico-Chirurgical et Obstétrical, CMCO) derived from a sample of the local population.

The graphical illustration, superimposing the histogram of Z-score distributions on a centered standard normal reference curve, provides a pertinent visual representation of the results. For example, considering the graphs corresponding to the Z-score distributions of BPD measurements obtained during the second trimester (Figure 1): the histogram of the Z-score distribution calculated with the Chitty equations shows a close match when superimposed on the centered standard normal curve, demonstrating the high level of sensitivity and specificity obtained with this equation in the screening of fetuses both < 5^{th} percentile (sensitivity 100%, specificity 98.26%) and > 95^{th} percentile (sensitivity 100%, specificity 97.51%). However, the histogram of Z-scores calculated using the Kurmanavicius equations is clearly skewed to the left. This indicates a high level of sensitivity but poor specificity for screening for fetuses below the 5^{th} percentile (sensitivity 100%, specificity 91.02%), and poor sensitivity for fetuses above the 95^{th} percentile (sensitivity 40.82%). Conversely, the histogram of Z-scores calculated with the equations of Salomon et al. is clearly skewed to the right. This reflects poor sensitivity for the screening of fetuses < 5^{th} percentile (sensitivity 48.04%), but a high level of sensitivity and poor specificity for fetuses > 95^{th} percentile (sensitivity 100%, specificity 88.37%).

DISCUSSION

This study aimed to evaluate the importance of the choice of reference curve for screening and to facilitate the choice of curve most suitable for the study population concerned. We did not aim to demonstrate the benefits of one curve over another, but to compare the effects of their use in common practice on a given population.

Most of the Z-score distributions for the measurements obtained differed significantly from a centered standard normal distribution. These differences would have led to incorrect evaluations of fetal biometric parameters, as shown by the poor sensitivity and specificity obtained in certain cases for identifying those fetuses < 5^{th} or > 95^{th} percentile. The concordance between the Z-score distributions of our measurements and the standard normal curve is, however, difficult to test, as it may vary during pregnancy. For example, for our population, the curve of Kurmanavicius et al. proved excellent for the screening of HC measurements > 95^{th} percentile during the third trimester (sensitivity, 99%, specificity, 100%), but gave poor results during the second trimester (sensitivity, 50%, specificity, 100%). Although most of the Z-score distribution curves deviated significantly from the centered standard normal distribution, the Z-value obtained in the Kolmogorov–Smirnov test and the calculation of the mean and SD made it possible to identify the curve most closely matching the standard normal curve.

The choice of reference curve has a considerable effect on the interpretation of biometric data. For example, the number of fetuses considered to have a BPD < 5^{th} percentile during the second trimester would have varied between 130 and 1312 depending on whether the reference equations of Snijders and Nicolaides or Kurmanavicius et al. had been used. Simply changing the reference curve used can multiply the likelihood of detecting abnormal biometric measurements in a fetus by a factor of 10, potentially leading to further testing with follow-up that is costly, causes anxiety, and is, in most cases, of no benefit. Occasionally, abnormal biometric measurements lead to karyotyping by invasive sampling, potentially risking fetal loss.

In an analysis of the results obtained for both the second and third trimesters, the reference equations developed by Chitty et al. seemed to give the best results for evaluating BPD, HC and FL in our population. The reference curve most appropriate for the evaluation of AC was that developed by Salomon et al., although the results obtained were hardly satisfactory, with a screening rate for the detection of fetuses truly < 5^{th} percentile or > 95^{th} percentile of only 72%. However, it should be noted that these overall results may not necessarily be relevant. Indeed, one of the strong points of this study is its independent analysis of the effect of the choice of reference curve for defining the 5^{th} and 95^{th} percentiles during the second and third trimesters. For example, the relevance of Salomon's equations for the evaluation of third-trimester AC can be analyzed independently for the 5^{th} and 95^{th} percentiles. As regards screening for intrauterine growth restriction, the results obtained for identifying AC < 5^{th} percentile were moderate: 64.94% sensitivity, 100% specificity and 142 fetuses wrongly classified. Snijders and Nicolaides' equations allowed improved sensitivity, with an acceptable rate of false-negatives: 100% sensitivity, 97.92% specificity and only 160 fetuses wrongly classified. Thus, the equations of Snijders and Nicolaides give the best results for the screening of suspected hypotrophy, even if the equations of Salomon et al. give the most appropriate results for the evaluation of AC overall. For the screening of suspected fetal macrosomia, with an AC > 95^{th} percentile, the results were barely satisfactory: 80.74% sensitivity, 100% specificity and 78 fetuses wrongly classified. The equations of Kurmanavicius et al. allowed improved sensitivity, but the specificity obtained was not as good: 100% sensitivity, 93% specificity and 537 fetuses wrongly classified. This would lead to frequent incorrect diagnosis of macrosomia, with its implications for the choice of method of childbirth.

The measurement of HC in the third trimester is particularly useful for screening for microcephaly. The best results were obtained with the equations of Kurmanavicius et al. (using a cut-off of < 5^{th} percentile): 100% sensitivity, 99.73% specificity and 21 fetuses wrongly classified. These results are better than those obtained with the equations of Chitty et al.: 100% sensitivity, 96.87% specificity and 241 fetuses wrongly classified. Thus, the use of the latter equations could lead to unnecessary invasive sampling, which could be avoided using the equations of Kurmanavicius et al. Additionally, the use of a reference curve giving good results for fetal screening based on HC below the 5^{th} percentile for the third trimester could be envisaged, even though this reference performs less well for abnormalities of the HC in general.

In conclusion, the expression of biometric measurements as Z-scores and the analysis of their distribution is a relatively simple and effective method for selecting the reference curve that best corresponds to a given population. Indeed, the analysis of fetal biometric data is largely dependent on the reference equations used. Testing for a good level of concordance between the population studied and the reference used is a key initial step in any quality control procedure. The use of Z-scores provides a simple method for evaluating the performance of each reference for a given population, with a view to improving the sensitivity and specificity of screening for fetal growth abnormalities. This is important given that the choice of an unsuitable reference curve can prove to be deleterious in terms of public health.

SUPPORTING INFORMATION ON THE INTERNET

The following supporting information may be found in the online version of this article:

Table S1 Reference equations used for each biometric parameter in this study.

Table S2 Contingency table for calculation of the sensitivity and specificity of identification of fetuses truly below the 5^{th} percentile.

Table S3 Contingency table for calculation of the sensitivity and specificity of identification of fetuses truly above the 95^{th} percentile.

Table S4 Mean and SD of Z-scores calculated using each of the reference equations for each parameter, and the results of the Kolmogorov–Smirnov tests, for measurements obtained in the second trimester.

Table S5 Mean and SD of Z-scores calculated using each of the reference equations for each parameter, and the results of the Kolmogorov–Smirnov tests, for measurements obtained in the third trimester.

Table S6 Results of screening of fetuses in the second trimester for measurements < 5^{th} percentile for each parameter using each of the reference equations.

Table S7 Results of screening of fetuses in the second trimester for measurements > 95^{th} percentile for each parameter using each of the reference equations.

Table S8 Results of screening of fetuses in the third trimester for measurements < 5^{th} percentile for each parameter using each of the reference equations.

Table S9 Results of screening of fetuses in the third trimester for measurements > 95^{th} percentile for each parameter using each of the reference equations.

Acknowledgements

We thank Bruno Clement-Ziza for assisting with the statistical analysis.