The impact of choice of reference charts and equations on the assessment of fetal biometry

Authors


Abstract

Objectives

The assessment of fetal biometry is usually based on the comparison of measured values with predicted values derived from reference charts or equations in a normal population. This study was undertaken to assess the impact of the choice of reference charts and to develop a Z-score-based tool that could help sonographers to choose the reference charts that best fit their practice.

Methods

Fetal biparietal diameter, head circumference, abdominal circumference and femur diaphysis length measurements were made at 20–24 and 30–34 weeks' gestation by four experienced sonographers. All measurements were transformed into Z-scores calculated according to three prediction equations (Snijders and Nicolaides, 1994; Chitty et al., 1994 and Kurmanavicius et al., 1999). Distributions of Z-scores were compared to the expected standard normal distribution based on mean, SD and Kolmogorov–Smirnov test. Simulations were made to assess sensitivity (Se), specificity (Sp) and Youden's index (Se + Sp − 1) of each reference equation, reflecting their ability to identify fetuses with abnormal biometry in our population. The reference that best fitted our practice was determined based on these results.

Results

The Z-scores of all biometric parameters were significantly different (P < 0.001) when using any of the three reference equations, and none of the Z-score distributions could be considered similar to the standard normal distribution. The number of measurements that would be considered as abnormal according to these references ranged from 2.6% to 23.6%. Se and Sp ranged from 39.59% to 67.12% and 90.14% to 99.69%, respectively.

Conclusion

Assessment of fetal biometry is largely dependent on the choice of reference charts. We suggest that the choice of reference charts for fetal biometry could be controlled using Z-scores in each institution and that this could be the first step towards any quality assessment policy. The method we describe for the choice of the most appropriate fetal biometry reference chart might be used for all size charts. Copyright © 2005 ISUOG. Published by John Wiley & Sons, Ltd.

Introduction

Fetal biometry is an important part of routine examination in the second and third trimesters of pregnancy. Fetal measurements can be combined in order to estimate fetal weight1 or can be compared to previous measurements in the same fetus in order to evaluate fetal growth longitudinally2, 3. However, most measurements are plotted on reference charts for gestation in order to compare fetal measurements with the normal distribution of the reference population. Measurements are considered either adequate, small (i.e. < 3rd, 5th or 10th centile) or large (i.e. > 90th, 95th or 97th centile) and fetal biometry is therefore used as a screening test to identify fetuses that are below or above cut-off values for normality and thus are at increased risk for biometric or morphological abnormalities4–6. Both customization of fetal size charts7, 8 and the assessment of fetal growth velocity3, 9 have been developed to improve the ability of fetal biometry to detect high-risk fetuses. However, at the screening level the use of cross-sectional reference charts and equations with the closest distribution to that of the screened population remains the gold standard. This study was undertaken to assess the impact of the choice of reference charts and to develop a Z-score-based tool that can help sonographers to choose the reference charts that best fit their practice.

Methods

This study was conducted in a population of pregnancies undergoing routine, second- or third-trimester ultrasound examination at 20–24 and 30–34 weeks' gestation, respectively, as part of routine antenatal care in France between June 2001 and January 2004. All measurements were performed to the nearest millimeter with no time constraints by only four trained sonographers using the same probe and ultrasound machine (3.5–5-MHz curvilinear abdominal transducer, General Electric Voluson 730 Expert, GE Medical System Europe-78, Buc, France) with a cineloop facility. Gestational age (GA), biparietal diameter (BPD), head circumference (HC), abdominal circumference (AC) and femur diaphysis length (FL) measurements were recorded in all cases. Exclusion criteria were: known abnormal karyotype or congenital malformation, multiple pregnancy or absence of first-trimester dating based on crown–rump length (CRL)10. No exclusion was made on the basis of abnormal biometry or birth weight.

All biometric measurements were performed according to the methodology published with the reference charts11–16. BPD and HC were measured on a transverse view of the fetal head in an axial plane at the level where the continuous midline echo is broken by the septum pellucidum in the anterior third as described by Campbell and Thoms4. BPD was measured with the calipers placed outer to outer. HC was derived from the measurements of the occipital–frontal diameter and the BPD using the formula π(d1 + d2)/2. AC was measured on a transverse circular plane of the fetal abdomen, just above the level of the cord insertion as described by Campbell and Wilkin17 and was also derived from the two maximum diameters of the circumference. FL was measured on a plane showing the entire femoral diaphysis, with both ends clearly visible and a <45° angle with the horizontal line. At 30–34 weeks, particular care was taken not to include the epiphysis.

All fetal measurements were retrospectively transformed into Z-scores according to three different previously published GA-related size charts and equations: References A11, B12–14, 18 and C15, 16. The equations are shown in Appendix S1. All these references should fit our practice based on demographic variables and measurement methodology described in these papers.

In all cases, Z-scores were calculated using the formula: Z-score = (XGA − MGA)/SDGA, where XGA is the measured value at a known CRL-based GA, MGA is the mean value according to the reference equation used at this GA and SDGA is the SD associated with the mean value at this GA according to the reference equation. Normality of Z-score distributions was assessed using the Kolmogorov–Smirnov and Shapiro–Wilk W-tests. Due to the large sample sizes, a statistically significant non-normality was accepted unless the normal plot showed a clear deviation from a straight line18.

Z-score distribution of measurements should follow a standardized normal distribution when fetal measurements performed in our population perfectly fit those of the reference population. Therefore, Z-score distributions at 20–24 and 30–34 weeks were analyzed for each of the three reference equations (Reference A, B and C). In all cases, mean and SD values of the Z-score distribution were computed. Means of Z-scores obtained with each reference were compared to each other using a non-parametric Friedman ANOVA. Means of Z-score distributions were tested against the theoretical expected value of 0 using a t-test for single sample, SDs were tested against the theoretical expected value of 1 based on the Chi-squared distribution, and the difference between the sample cumulative distribution and the hypothesized standard normal cumulative distribution was assessed using a continuous Kolmogorov–Smirnov one-sample test (Kolmogorov–Smirnov d value)19, 20.

The measured 5th and 95th centile of the Z-score distributions for a given measurement in our population should also fit with the expected 5th and 95th centile values (i.e. −1.645 and +1.645, respectively). Therefore, the actual 5th and 95th centile of the Z-score distributions were computed for each type of measurement in order to identify fetuses that were in the tails of our distribution. Simulations were made to compare the number of measurements that would be classified as abnormal as compared to the reference (i.e. Z-score <−1.645 or >1.645).

The sensitivity (Se) and specificity (Sp) of biometry to detect the actual 5th and 95th centile were calculated based on contingency tables (Table S1). The overall number of misclassified fetuses (false-positive + false-negative) was computed for all measurements and all references, and the overall Se, Sp, Youden's index (Se + Sp − 1), and positive (PPV) and negative predictive value (NPV) were calculated at 20–24 and 30–34 weeks both separately and then globally.

The reference (Reference A, B or C) for each type of measurement that best fitted our practice was determined as having a Z-score distribution with a mean value and SD closest to 0 and 1, respectively. In addition, it should have the smallest Kolmogorov–Smirnov d value, the lowest number of misclassified fetuses and the highest Youden's index for both second- and third-trimester examination taken globally. Indeed, it would be illogical to switch references between the second and third trimesters.

The two-by-two differences between references (References AB, AC and BC) for the prediction of the 5th, 50th and 95th centile were computed and plotted in order to illustrate the difference between all three references across gestation. This was done for BPD as an example.

For all the tests used, a value of P < 0.05 was considered statistically significant.

Results

Fetal BPD, HC, AC and FL were measured in 5241 and 4379 ultrasound examinations at 20–24 and 30–34 weeks, respectively, and were included in the analysis. Z-scores were normally distributed as assessed by normality tests and/or normal plot.

At 20–24 weeks' gestation, the mean and SD of the Z-scores ranged from −1.037 to 0.647 and from 0.746 to 0.926, respectively. At 30–34 weeks, these ranged from −0.538 to 0.730 and from 0.762 to 1.001, respectively. Means of Z-scores obtained with each of the three references were statistically different for each type of measurement as assessed by Friedman ANOVA (P < 10−5 in all cases). All mean values were statistically different from 0 as assessed by Student's t-test (P ≤ 10−4 in all cases), all SDs were statistically different from 1 based on the Chi-squared distribution (P ≤ 10−3 in all cases) except Z-score distributions of BPD and HC in the third trimester when assessed upon Reference B12 (P = 0.01 and P = 0.57, respectively). Kolmogorov–Smirnov d values obtained from the comparison between the actual Z-score distribution and the expected standard normal distribution were significant in all cases (P < 10−2) and ranged from 0.062 to 0.432 and from 0.085 to 0.299 in the second and third trimesters, respectively (Table 1).

Table 1. Mean values and SD of Z-score distributions were tested against 0 and 1, respectively, and the overall distribution was tested against the expected standard normal distribution. Results are shown for the second (n = 5241) and third trimesters (n = 4379)
ParameterReferenceMeant(t-test)SDK–S d value
  • *

    P < 10−2.

  • **

    P < 10−3.

  • Reference A: Snijders and Nicolaides11. Reference B: Chitty et al.12–14. Reference C: Kurmanavicius et al.15, 16. AC, abdominal circumference; BPD, biparietal diameter; FL, femur diaphysis length; HC, head circumference; K–S, Kolmogorov–Smirnov.

Second trimester
 BPDA−0.090−7.270*0.892**0.062**
B−0.559−43.716*0.926**0.234**
C−1.037−88.776*0.846**0.432**
 HCA0.38535.531*0.784**0.203**
B−0.218−19.531*0.808**0.130**
C−0.440−42.718*0.746**0.237**
 ACA0.57354.835*0.756**0.275**
B0.57945.822*0.915**0.232**
C0.40236.008*0.809**0.193**
 FLA0.64757.527*0.815**0.297**
B0.16114.676*0.794**0.118**
C0.29426.716*0.796**0.166**
Third trimester
 BPDA−0.447−8.7974**0.762**0.229*
B−0.194−3.1313**0.9780.086*
C−0.538−9.9325**0.891**0.239*
 HCA−0.054−0.4229*0.801**0.073*
B0.16610.9964**1.0020.065*
C0.35124.5354**0.947**0.141*
 ACA−0.043−0.8938*0.732**0.089*
B0.73052.3421**0.923**0.299*
C0.67748.8845**0.917**0.280*
 FLA0.25118.0148**0.923**0.118*
B0.27019.4250**0.919**0.119*
C0.28123.3554**0.795**0.156*

At 20–24 weeks, measured 5th and 95th centiles of Z-score distributions in our population ranged from −2.438 to −0.607 and 0.329 to 2.145, respectively. The overall number of fetuses considered to have an abnormal measurement as compared to the expected 5th and 95th centile (−1.645 or 1.645) according to the three reference equations ranged from 174 (3.3%) to 1239 (23.6%). Se, Sp, PPV and NPV therefore ranged from 32.95% to 64.97%, 79.40% to 100%, 21.63 to 100% and 92.88% to 96.2%, respectively. An illustration of the discrepancy between expected (standardized normal distribution) and measured distribution, generating false-positive screened fetuses, is shown in Figure 1 using an example of a BPD < 5th centile at 20–24 weeks as assessed with Reference C. The number of misclassified fetuses ranged from 186 (3.5%) to 1231 (23.5%) (Table 2).

Figure 1.

Illustration of discrepancy between expected (standard normal distribution) and measured distribution, generating false-positive screened fetuses. This example is for biparietal diameter below the 5th centile at 20–24 weeks' gestation as assessed with Reference C and could be generalized to all other cases. FP, false-positive; TP, true-positive.

Table 2. The results of screening for both measurements below the 5th and above the 95th centile in the second (n = 5241) and third trimesters (n = 4379), respectively
ParameterReferenceNumber beyond measured 5th or 95th centile*Number beyond expected 5th or 95th centileTPFPFNNumber misclassifiedSe (%)Sp (%)Youden's indexPPV (%)NPV (%)
  • *

    Number of subjects beyond thresholds may vary due to equal results among subjects. Reference A: Snijders and Nicolaides11. Reference B: Chitty et al.12–14. Reference C: Kurmanavicius et al.15, 16. AC, abdominal circumference; BPD, biparietal diameter; FL, femur diaphysis length; FN, false-negative; FP, false-positive; HC, head circumference; NPV, negative predictive value; PPV, positive predictive value;

  • Se, sensitivity; Sp, specificity; TP, true-positive.

Second trimester
 BPDA531 345345  0186 18664.971000.65010096.20
B525 687297390228 61856.5791.730.48343.2394.99
C5281239268971260123150.7679.400.30221.6393.50
 HCA527 325298 27229 25656.5599.430.56091.6995.34
B527 251251  0276 27647.631000.47610094.47
C524 280280  0244 24453.441000.53410095.08
 ACA529 419269150260 41050.8596.820.47764.2094.61
B527 653282371245 61653.5192.130.45643.1994.66
C528 360278 82250 33252.6598.260.50977.2294.88
 FLA520 582268314252 56651.5493.350.44946.0594.59
B538 225225  0313 31341.821000.41810093.76
C528 174174  0361 36132.951000.33010092.88
Third trimester
 BPDA439 273237 36202 23853.9999.090.53186.8195.08
B439 448350 98 89 18779.7397.510.77278.1397.74
C438 525253272185 45757.7693.100.50948.1995.20
 HCA438 166166  0272 27237.901000.37910093.54
B434 443343100 91 19179.0397.470.76577.4397.69
C435 440273167162 32962.7695.770.58562.0595.89
 ACA436 113113  0323 32325.921000.25910092.43
B438 717235482203 68553.6587.770.41432.7894.46
C436 664238426198 62454.5989.200.43835.8494.67
 FLA430 376289 87141 22867.2197.800.65076.8696.48
B439 385280105159 26463.7897.340.61172.7396.02
C440 360262 98220 31859.5597.510.57172.7894.53

At 30–34 weeks, measured 5th and 95th centiles of Z-score distributions in our population ranged from −1.997 to −0.771 and 0.8112 to 2.2843, respectively. The overall number of fetuses considered to have an abnormal measurement as compared to the expected 5th and 95th centile (−1.645 or 1.645) according to the three reference equations ranged from 113 (2.6%) to 717 (16.4%). Se, Sp, PPV and NPV therefore ranged from 25.92% to 79.73%, 87.77% to 100%, 32.78 to 100% and 92.43% to 97.74%, respectively. The number of misclassified fetuses ranged from 187 (4.3%) to 685 (15.6%) (Table 2). Detailed results for screening of fetuses below the 5th centile and above the 95th centile at 20–24 and 30–34 weeks are given in Tables S2 and S3.

Considering both examinations, the number of misclassified fetuses at 20–24 and 30–34 weeks ranged from 424 (4.4%) to 1688 (17.6%). Se, Sp, PPV and NPV ranged from 39.59% to 67.12%, 90.14% to 99.69%, 29.54 to 94.5% and 93.61% to 96.26%, respectively. The Youden's index ranged from 0.396 to 0.615. Kolmogorov–Smirnov d values were between 0.045 and 0.335 (Table 3). Based on these results, it appeared that Reference A should be preferred when analyzing BPD measurements and Reference B should be used for HC and FL measurements. For AC measurements, although it appeared that Reference B should not be used, it was not possible to choose between References A and C.

Table 3. Main overall results for the classification of the cases with the three references at 20–24 and 30–34 gestational weeks taken globally (n = 9620)
ParameterReferenceMean*SD*Number misclassifiedSe (%)Sp (%)PPV (%)NPV (%)Youden's indexK–S d value
  • The reference that best fitted the author's practice for each measurement is shown in bold.

  • *

    Calculated upon distribution of values at second and third trimesters. Reference A: Snijders and Nicolaides11. Reference B: Chitty et al.12–14. Reference C: Kurmanavicius et al.15, 16. AC, abdominal circumference; BPD, biparietal diameter; FL, femur diaphysis length; HC, head circumference; K–S, Kolmogorov–Smirnov; NPV, negative predictive value; PPV, positive predictive value; Se, sensitivity; Sp, specificity.

BPDA−0.250.85 42460.0099.5894.1795.690.5960.132
B−0.390.97 80567.1294.365796.260.6150.164
C−0.810.90168853.9385.6429.5494.340.3960.335
HCA0.190.82 52848.0899.6994.594.510.4780.108
B−0.040.9246761.8198.8585.5995.890.6070.045
C−0.080.93 57357.6698.0776.8195.440.5570.067
ACA0.290.81 73339.5998.2771.893.580.3790.152
B0.650.92130153.5890.1437.7494.570.4370.261
C0.530.87 95653.5394.1350.3994.790.4770.226
FLA0.470.89 79458.6395.3758.1495.460.540.210
B0.210.8657751.6998.7982.7994.760.5050.112
C0.310.88 67945.0498.8781.6593.610.4390.160

The differences between each couple of references (References AB, AC and BC) for the prediction of 5th, 50th and 95th centile for BPD were plotted and are shown in Figure 2 as an illustration.

Figure 2.

Difference in prediction of the 5th, 50th and 95th centile for biparietal diameter between (a) References A11 and B12, (b) References A11 and C15 and (c) References B12 and C15.

Discussion

The use of cross-sectional charts remains the first-line screening tool for growth abnormalities. It is well accepted that the chart used for fetal biometry should be adapted to the population studied. However, to our knowledge this is the first study that extensively examines the impact of the choice of reference charts.

This study did not aim to assess the actual ability of these charts to detect truly abnormally grown fetuses21, 22; instead it compared the performance of each reference chart in our practice when measurements were made according to the authors' recommendations. This study also did not aim to assess whether one reference was more valid than another. Indeed, all three references were thoroughly derived, using adequate sample size and methodology and taking into account the increasing variability in measurements with gestation11–16, 18. Furthermore, although our measurements were performed by trained sonographers who had performed more than 2000 examinations per year for the last 5 years, we did not aim to judge the quality of our database nor to compare sonographers with each other. Indeed, all the sonographers' measurements had narrow normal Z-score distributions with SD values below 1 in almost all cases (Table 1). This is likely to reflect the small number (four) of operators involved in the database and their extensive practice, which is likely to reduce the variability. However, paradoxically, this low variability may actually decrease efficacy as measurements are then compared to inappropriate reference charts. Discrepancies in mean values and in the SD between measurements performed in the studied population and the reference charts are illustrated in Figure 3a and 3b, respectively. Figure 3c and 3d give two examples of acceptable concordance between the actual Z-score distribution and the expected distribution.

Figure 3.

Comparison of Z-score distribution with reference distribution. (a) Distribution of biparietal diameter Z-scores in the second trimester when compared with Reference C15. Despite the acceptable SD (0.846), the distribution is largely shifted to the left when compared to the expected distribution (mean = −1.037). (b) Distribution of abdominal circumference Z-scores in the third trimester when compared with Reference A11. Although the mean is close to 0 (−0.043) the distribution is too narrow (SD = 0.732), leading to incongruity between our population and the reference. (c) The distribution of biparietal diameter Z-scores in the second trimester when compared with Reference A11 shows an acceptable agreement between our population and the reference (mean = −0.09, SD = 0.892). (d) The distribution of head circumference Z-scores in the third trimester when compared with Reference B12 shows an acceptable agreement between our population and the reference (mean = 0.166, SD = 1.002).

Discussion of these results can be divided into three main areas as follows:

  • (1)The number of fetuses classified as having a BPD below the expected 5th centile (Z-score < −1.645) at 20–24 weeks ranged from 345 (6.6%) to 1239 (23.7%) when using References A or C, respectively (Table 2). This means that a change in reference chart can lead to a four-fold increase in the risk of being classified as abnormal, leading to mostly unnecessary anxiety and resource allocation for follow-up. A significant proportion of these fetuses may also undergo an invasive procedure and be exposed to a risk of fetal loss. Such false-positive screened fetuses arise from the discrepancy between the measured population and the expected distribution according to the reference and this is illustrated in Figure 1. Although more fetuses were classified as abnormal with Reference C, this did not lead to an increase in Se or Sp. The same applied when comparing References A and B, leading to a small increase in Se for AC measurement at 30–34 weeks at the expense of a six-fold increase in measurements below the 5th centile. Tables S2 and S3 show that discrepancies affected both the 5th and the 95th centile of our population.
  • (2)In our study, Z-score distribution had a mean value and SD that were statistically different in almost all cases from the expected 0 and 1 values, respectively, and Kolmogorov–Smirnov d values were statistically significant in all cases (Table 1). Such unsuitability may lead to inappropriate results in the assessment of fetal size as demonstrated by the poor Se and Sp achieved with some reference charts in our population. This discordance is difficult to control as it may vary throughout gestation, as illustrated by the difference in the results obtained during the second and third trimesters. These variable results throughout gestation are likely to result from differences between the various references themselves throughout pregnancy (Figure 2). Indeed, References B and C are similar throughout pregnancy but are markedly different from Reference A, and this varies with gestation. However, only one reference should be chosen for each measurement in order to make longitudinal follow-up of fetal growth meaningful.
  • (3)This study provides sonographers with useful information on how to choose reference charts and equations that best fit their practice. A preliminary step should consist of transforming all measurements into Z-scores. Z-scores have been increasingly used in recent years and have been designated by the World Health Organization as the recommended system to compare anthropometric measurements to the reference population23. A major advantage of the Z-score system is that a group of Z-scores can be used as an input for summary statistics such as mean and SD, therefore allowing for the comparison between several groups. If there is good agreement between the observed distribution and the reference distribution then the Z-score distribution should become the standard normal distribution. Although this is only theoretical, Z-score distribution should have a mean and SD as close as possible to 0 and 1, respectively, and the maximum difference from the expected distribution should be as small as possible as assessed by the Kolmogorov–Smirnov d value. Since biometry is mainly used as a screening test, Se should be as high as possible together with an acceptable Sp in order to avoid unnecessary worry and follow-up. Based on these recommendations and given our measurement distribution, Reference B best fitted our practice for HC reference. BPD and FL measurements should be made using References A and B, respectively. None of these references was found to be acceptable for AC measurements as they all showed poor results in our population.

The assessment of fetal biometry is widely dependent upon the choice of reference charts and equations. Appropriateness of fetal measurements with expected values calculated upon reference equations used in each institution should be controlled and such a process should be the first step towards any quality control policy. Application of Z-scores allows for more accurate use of reference charts and therefore improved identification of at-risk fetuses which in turn should facilitate counseling and make better use of resources.

Ancillary