Multivariable analysis of tests for the diagnosis of intrauterine growth restriction

Authors


Abstract

Objectives

To describe how data from antenatal fetal ultrasound biometry, amniotic fluid index and umbilical artery Doppler can be appropriately combined using multivariable models and to investigate how the addition of these ultrasound parameters influences the ability to predict intrauterine growth restriction (IUGR).

Methods

This was a prospective cohort study involving 274 low-risk pregnancies undergoing serial ultrasound examination at predetermined intervals. Standard deviation (Z) scores of the last values for fetal abdominal area (FAA), growth velocity of the FAA, amniotic fluid index (AFI) and umbilical artery Doppler pulsatility index prior to delivery were calculated for 260 fetuses. Customized estimated fetal weight (cEFW) centiles were also calculated using the last EFW before delivery after adjustment for fetal gender, gestational age, birth order and maternal weight, height and ethnic origin. Following delivery the neonatal ponderal index was calculated and centile position obtained. A neonatal ponderal index <25th centile served as the main outcome measure for diagnosis of IUGR. Logistic regression analysis was used to delineate the predictive value of the three fetal growth tests FAA, FAA growth velocity and cEFW and the additional values of AFI and pulsatility index of the umbilical artery.

Results

The areas under the receiver–operating characteristics (ROC) curves (95% confidence interval) for FAA, FAA growth velocity and cEFW alone were 0.819 (0.748–0.891), 0.784 (0.699–0.869) and 0.74 (0.643–0.837), respectively, in the prediction of a neonatal ponderal index <25th centile. The addition of both the AFI and pulsatility index to FAA, FAA growth velocity and cEFW generated small increases in the areas, to 0.831 (0.758–0.904), 0.817 (0.735–0.899) and 0.766 (0.672–0.859), respectively. These improvements in diagnostic prediction were not statistically significant.

Conclusions

The addition of AFI and umbilical artery pulsatility index to the fetal biometry parameters did not significantly increase the ROC areas in the study population. The approach applied in this study is useful in the context of hypothesis generation. Further studies using larger data sets and other predictors should be carried out using the analytical techniques outlined in this paper to determine the contribution of various antenatal tests in the prediction of IUGR. Copyright © 2003 ISUOG. Published by John Wiley & Sons, Ltd.

Introduction

Fetal assessment with ultrasound biometry, amniotic fluid volume assessment and umbilical artery Doppler waveform analysis is frequently practised in modern antenatal care. When the clinician is presented with a number of ultrasound parameters, decision-making is inevitably based upon a clinically intuitive combination of these tests. In the published literature, tests are evaluated for their performance independently of each other but such an approach is associated with bias1.

Fetal biometry expressed either as a single measurement or as change in serial measurements performs with varying degrees of success in the prediction of intrauterine growth restriction (IUGR) in both low- and high-risk pregnancies2–4. To our knowledge there is no published description of an appropriate statistical methodology for combining information from more than one ultrasound parameter in the prediction of infants born with evidence of IUGR. This study describes how several ultrasound fetal variables can be appropriately combined using multivariable models and how the addition of different ultrasound parameters influences test performance. In particular, we wanted to examine if the addition of amniotic fluid index (AFI) and Doppler examination of the umbilical artery to standard and customized fetal biometry improved diagnostic prediction.

Methods

Three hundred and thirteen women attending the antenatal clinic at Ninewells Hospital, Dundee, Scotland were enrolled into a study of fetal growth which has previously been described in detail5. Entry criteria were singleton pregnancy, gestational age <85 days confirmed by crown–rump length (CRL) measurement and the absence of recognized risk factors for accelerated or restricted fetal growth including a history of a previous small-for-gestational age infant, existing medical disorders or heavy smoking (>20 cigarettes per day). All the subjects were scanned for fetal anomaly at 18 weeks and subsequently underwent ultrasound examinations at 4-week intervals before 30 weeks and 2-week intervals thereafter until delivery.

All ultrasound measurements were made by one of the authors (P.O.) using an Aloka SSD-650 (Aloka, Tokyo, Japan) real-time ultrasound scanner using a 3.5-MHz probe. CRL was measured in a standard manner employing a frozen on-screen image and electronic calipers6. Gestational age was calculated with reference to CRL and not menstrual data7. The fetal abdominal area (FAA) was measured at the level of the umbilical vein8 by tracing the outline of the trunk on screen. For each fetus the last values for FAA prior to delivery were converted to standard deviation (Z-) scores using previously published gestational age specific values for means and standard deviations9. Growth velocity of the FAA was determined from the last and third-last measurements for each fetus; FAA growth velocity was calculated over this mean 28-day interval and expressed as a Z-score using previously published standards5.

Estimated fetal weight (EFW) was calculated from the biparietal diameter (BPD), FAA and femur length (FL) measurements using a previously validated formula10, 11. A customized estimated fetal weight (cEFW) centile was calculated for each fetus using the last EFW before delivery after adjustment for fetal gender, gestational age, maternal weight at booking, birth order and maternal ethnic origin; this was facilitated by on-line software available at www.wmpi.net12. Fetal abdominal area Z-score, FAA growth velocity Z-score and cEFW served as the measures of fetal size and growth.

Amniotic fluid index was measured once at each visit using a standard technique13. The umbilical artery pulsatility index was calculated from the mean of three waveforms obtained by insonating a free-floating segment of cord during fetal quiescence and in the absence of fetal breathing. The last values for AFI and pulsatility index prior to delivery are described as Z-scores13, 14.

Neonatal anthropometric measurements were made on the second or third days of life. The baby's length was measured on a standard neonatal anthropometer; the mean of three measurements was recorded, the ponderal index was calculated and the centile position was obtained15 (ponderal index = weight/crown–heel length3 (g/cm3) × 100). In addition, neonatal skinfold thickness was measured using Holtain calipers (Crymych, UK) by one observer. Three measurements were made at the subscapular and triceps areas on the child, the mean measurement was recorded and a centile position obtained after adjustment for gestational age and sex16.

Infants were considered to have IUGR if the ponderal index was <25th centile and/or one or both skinfold thickness measurements were <10th centile. These cut-off values were chosen because they identify only a small proportion of the low-risk population as being growth-restricted and as such are likely to represent true growth restriction. There is no consensus as to which neonatal measurement best represents nutritional status but the ponderal index is more widely reported than skinfold thickness. For this reason, the ponderal index was selected as the main outcome measure (results for skinfold thickness are presented in the Appendix).

Statistical evaluation of combination of diagnostic tests

Logistic regression analysis was used to delineate the predictive value of the three fetal size and growth tests (FAA, FAA growth velocity and cEFW) separately and with the additional values of AFI and umbilical artery pulsatility index Z-scores. For our analysis we used ponderal index <25th centile as the binary dependent (outcome) variable. In the model-building process, first univariable analyses were carried out for each of the three fetal size and growth tests to predict IUGR. Then each of the three biometry tests was combined with AFI and pulsatility index. To assess the stability of our inferences, the analyses were repeated with skinfold thickness <10th centile as the dependent variable. In total, we ran 24 regression models using the SPSS® statistical software package (Version 10.0 SPSS Inc. 2001. Chicago, IL, USA).

The predictive value of each model was initially summarized by plotting the estimates of sensitivity (true positive rates) against 1–specificity (false-positive rates) to develop a receiver–operating characteristics (ROC) curve that characterized the performance of the particular diagnostic model17, 18. This approach enables comparison of the models by analyzing the difference in magnitude of the ROC areas. A difference between areas is then assessed for statistical significance19–22. The area under the ROC curve provides information about the entire range of test results. A ROC area of 0.5 describes a non-informative test whereas a ROC area of 1.0 represents a fully informative test that discriminates between disease presence and absence perfectly. The ROC curves generated from logistic regression modeling and their respective areas were compared. This approach allowed an assessment of the incremental value of the addition of AFI and umbilical artery pulsatility index over and above that of fetal size and growth alone.

Results

Two hundred and seventy four women continued in the study. A total of 260 (95%) delivered at 37 weeks' gestation or more. Twenty-two (8%) and 11 (4%) had adjusted birth weights below the 10th and third centiles respectively. There were no cases of structural or chromosomal anomaly. Skinfold thickness and ponderal index were available in 238 and 257 cases, respectively. Twenty-six (10.9%) infants had one or both skinfold thickness measurements <10th centile and 40 (15.6%) had a ponderal index <25th centile.

Two hundred and fifty eight cases had complete ultrasound and maternal data (FAA, FAA growth velocity, cEFW, AFI and pulsatility index) with one (n = 242) or both (n = 226) ponderal index/skinfold thickness centiles; it is these 258 cases which form the population undergoing further analysis. For these 258 cases with complete ultrasound data and at least one measure of growth achievement, the mean and range of the Z-scores were: FAA, −1.8 (0.44 to −3.9); FAA growth velocity, −0.13 (2.87 to −3.1); AFI, −0.6 (2.93 to −2.8); pulsatility index, 0.08 (−1.28 to 2.7). The mean and range of cEFW was 39 (0.6 to 99.8).

The distribution of the values for AFI and umbilical artery Doppler pulsatility index for both small (FAA Z-score < −2.5) and appropriate-sized fetuses (FAA Z-score > −2.5) is presented in Figure 1 (FAA Z-score of −2.5 was chosen after inspection of the ROC curve3). There was a wide range of values for AFI and umbilical artery recorded amongst both small and appropriately sized subjects.

Figure 1.

Plot of standard deviation (Z-) scores for amniotic fluid index (AFI) and umbilical artery Doppler pulsatility index (PI) for cases with FAA Z-scores > −2.5 (a) and < −2.5 (b).

Tests for predicting ponderal index <25th centile showed that the areas under the ROC curves for FAA, FAA growth velocity and cEFW alone were 0.819 (0.748–0.891), 0.784 (0.699–0.869) and 0.74 (0.643–0.837), respectively. The stepwise addition of information from the AFI increased the area only slightly to 0.82 (0.747–0.894), 0.787 (0.702–0.871) and 0.75 (0.652–0.847), respectively. Combining fetal size and growth measures with the pulsatility index resulted in areas of 0.829 (0.758–0.901), 0.817 (0.734–0.901) and 0.762 (0.669–0.856), respectively. The full model consisting of one of the fetal size and growth tests (FAA, FAA growth velocity, cEFW), AFI and pulsatility index increased the area to 0.831 (0.758–0.904), 0.817 (0.735–0.899), and 0.766 (0.672–0.859), respectively (Table 1). The results for the combination of tests in the prediction of skinfold thickness <10th centile showed a pattern similar to that observed above for the ponderal index (Appendix).

Table 1. With ponderal index as a measure of neonatal outcome, accuracy of three sonographic fetal biometric tests in the prediction of intrauterine growth restriction and the additional benefit of measuring amniotic fluid index (AFI) and pulsatility index (PI) of the umbilical artery
Regression models for estimating accuracy of combinations of tests*Sonographic fetal biometric tests (area under ROC curve (95% CI))
Fetal abdominal areaGrowth velocity of fetal abdominal areaCustomized estimated fetal weight
  • *

    Accuracy estimation based on logistic regression models using neonatal ponderal index <25th centile as the outcome variable. Predictor variables included one of the fetal biometry tests with or without AFI and/or umbilical artery PI. ROC, receiver–operating characteristics.

Fetal biometry test alone0.819 (0.748–0.891)0.784 (0.699–0.869)0.74 (0.643–0.837)
Fetal biometry test + AFI0.82 (0.747–0.894)0.787 (0.702–0.871)0.75 (0.652–0.847)
Fetal biometry test + PI0.829 (0.758–0.901)0.817 (0.734–0.901)0.762 (0.669–0.856)
Fetal biometry test + AFI + PI0.831 (0.758–0.904)0.817 (0.735–0.899)0.766 (0.672–0.859)

Discussion

This study demonstrates a stepwise, multivariable approach for combining diagnostic tests of fetal growth and fetal well-being in the prediction of infants with anthropometric features of IUGR. We selected neonatal anthropometry as our outcome measure rather than birth weight for gestational age since anthropometric measures correlate more closely with subsequent short- and long-term outcomes for the infant23, 24.

Previously published univariable analyses have outlined the potential and limitations of several of these parameters in the study population and in other populations2–4, 25 but until now there has been no published description of the appropriate methodology for combining test performances in the prediction of IUGR. In this study, multivariable analyses, built to reflect the clinical sequence in which a variety of diagnostic tests might be employed, showed that the addition of information from amniotic fluid volume evaluation and Doppler examination of the umbilical arteries to any of three measures of fetal biometry only marginally increased the ability to correctly identify IUGR.

We chose to add estimates of amniotic fluid volume and umbilical artery Doppler resistance to fetal biometry since they provide information on different aspects of placental function and pathology26, 27. The results of this study should not be interpreted as suggesting that umbilical artery Doppler is not a useful test in the clinical management of the small-for-gestational age or high-risk pregnancy where the end-point is fetal or neonatal hypoxia/acidosis. These outcomes are uncommon in low-risk pregnancies and a much larger population would be required to evaluate the addition of umbilical artery Doppler to biometry using the methodology described here.

Currently, diagnostic research tests are often evaluated as if they provide information independently of each other. Estimates of diagnostic accuracy derived in this way ignore information that may have already been obtained from prior testing in the diagnostic process. Such an approach can lead to erroneous inferences and may artificially inflate the value of test combinations.

We further compared the diagnostic probabilities generated by a simple Bayesian approach (assuming complete independence of fetal growth and well-being) with those derived from the multivariable model described here (which takes account of the overlap of information between the tests). Figure 2 illustrates that when employing the FAA Z-score as the measure of fetal size, the simple Bayesian approach overestimates the probability of correctly identifying IUGR by almost 20%. This highlights the potential for bias inherent in simple diagnostic analyses without logistic regression modeling. This overestimation of disease probability can have an adverse impact on clinical decision-making, an issue requiring further investigation.

Figure 2.

Predictions of probabilities for ponderal index <25th centile combining information from estimates of fetal size (FAA), amniotic fluid index (AFI) and umbilical artery Doppler (PI). Comparison of a simple Bayesian approach (—●—) and logistic regression modeling (······▪······) (see text for details). Optimal cut-off values for a positive test result were chosen using receiver–operating characteristic curves (FAA −2.5; AFI −0.8; PI 0.6)3, 25.

Our inferences based on the analysis of ROC areas should be seen in the context of hypothesis generation; this is because a ROC curve is derived using the entire range of possible cut-offs of a positive test result. In clinical practice, tests are usually applied at a specific cut-off for a positive result, so clinical decisions are only made at specific parts of the ROC curve. In addition, comparisons of diagnostic strategies based on the ROC area can become problematic if the curves differ in their shape and especially if their lines cross28.

We believe that this paper provides guidance for future research involving larger data sets but that it cannot be used for defining clinical practice at this initial stage. There is currently no consensus on the most appropriate method of sample size calculation for multivariable analyses but a study should aim to have at least 10 cases with the condition per diagnostic variable analyzed. For example, if a condition has a prevalence of 10% and five variables or indicators are included, a minimum number of 500 cases is required29. To further guide clinical decision-making, evaluations of test performances and the cost-consequences of testing (or not testing) at clinically meaningful treatment thresholds for different diagnostic strategies will be required.

Acknowledgements

P. Owen is grateful to Wellbeing, the charitable arm of the Royal College of Obstetricians and Gynaecologists, London, for financial support during the initial collection of fetal biometry.

Appendix

With skinfold thickness as a measure of neonatal outcome, accuracy of three sonographic fetal biometric tests in the prediction of intrauterine growth restriction and the additional benefit of measuring amniotic fluid index (AFI) and pulsatility index (PI) of the umbilical artery.

Regression models for estimating accuracy of combinations of tests*Sonographic fetal biometric tests (area under ROC curve (95% CI))
Fetal abdominal areaGrowth velocity of fetal abdominal areaCustomized estimated fetal weight
  • *

    Accuracy estimation based on logistic regression models using neonatal skinfold thickness <10th centile as the outcome variable. Predictor variables included one of the fetal biometry tests with or without AFI and umbilical artery PI. ROC, receiver–operating characteristics.

Fetal biometry test alone0.802 (0.719–0.885)0.742 (0.653–0.832)0.806 (0.731–0.88)
Fetal biometry test + AFI0.83 (0.747–0.912)0.786 (0.7–0.872)0.75 (0.834–0.908)
Fetal biometry test + PI0.82 (0.745–0.896)0.791 (0.702–0.88)0.824 (0.753–0.895)
Fetal biometry test + AFI + PI0.843 (0.766–0.921)0.821 (0.742–0.9)0.845 (0.773–0.917)

Ancillary