Measurement error for ultrasound fetal biometry performed by paramedics in rural Bangladesh

Authors


Abstract

Objectives

To document the accuracy and precision of sonographic fetal biometry performed by nine paramedics from rural Bangladesh.

Methods

Paramedics underwent intensive training (6 weeks) including hands-on practice then underwent a series of standardization exercises. Measurements of each fetus were taken by a highly-trained medical doctor (study supervisor) and the nine paramedics. Crown–rump length (CRL) in fetuses of less than 10 weeks' gestation, and biparietal diameter (BPD), occipitofrontal diameter, head and abdominal circumference (AC) and femur diaphysis length (FL) were measured twice using standard procedures by each paramedic and the medical doctor for each fetus, with at least 20 min between them. Precision was quantified using variance components analysis; the intraobserver error for each of the paramedics was calculated by comparing repeat measurements taken on the same participant, and the measurements obtained by each individual paramedic were also compared with those taken by the others (interobserver error). Accuracy was estimated by comparing the mean of the two measures taken by each paramedic to those taken by the study supervisor using paired t-tests. Bland–Altman plots were used to visually assess the relationship between precision of repeat measurements (intraobserver error) and fetal size.

Results

A total of 180 women, at 7 to 31 weeks' gestation, participated in the study. Intraobserver error of the measurements obtained by the paramedics, expressed as the mean SD, ranged from 0.97 mm for BPD in the first trimester to 7.25 mm for AC in the third trimester, and was larger than the interobserver error (i.e. accounting for a greater proportion of total variance) for most measurements. Interobserver error ranged from 0.00 mm for FL to 3.36 mm for AC, both in the third trimester. For all measurements except CRL, intraobserver error increased with increasing fetal size. The measurements obtained by the paramedics did show some statistically significant differences from those obtained by the study supervisor, but these were relatively small in magnitude.

Conclusions

Both inter- and intraobserver measurement errors were within the range reported in the literature for studies conducted by technical staff and medical doctors. With intense training, paramedics with no prior exposure to ultrasonography can provide accurate and precise measures of fetal biometry. Copyright © 2009 ISUOG. Published by John Wiley & Sons, Ltd.

Introduction

Fetal biometry assessed by ultrasonography is an integral part of obstetric practice in most high-income countries. In the clinic setting, ultrasound estimates of size and growth have been used for years as part of routine practice or through referrals1, 2. A similar situation exists in many middle- to high-income urban populations in developing countries3, 4 and even in a few economically disadvantaged groups5. Even in these clinic settings, the accuracy (that the measurement is a reflection of the true value) and precision (reproducibility of the measurement by the same person) of measurements has not been well documented6 and methodological error may influence the validity of ultrasound fetal biometry7. In the clinic context, this could have important implications for the estimation of gestational age and for the detection of fetuses at risk for intrauterine growth faltering. In research, where the hypothesized difference to be detected between intervention groups may be small, the documentation of measurement error is fundamental.

Recently, there has been increasing interest in the use of ultrasonography to assess fetal growth among socio-economically disadvantaged populations in developing countries8–10. Bone (femur) growth assessed by ultrasound has also been used as an outcome variable in clinical trials designed to improve fetal growth11. In non-hospital settings it may be more feasible and advantageous to contract less specialized medical staff to perform ultrasound examinations. In this context, documentation of the accuracy and precision of ultrasound measures is essential.

We recently conducted ultrasound examinations in four rural districts of Bangladesh as part of a large community-based randomized intervention trial designed to test the impact of food and micronutrient supplementation on fetal growth and other outcomes. Results of that trial are forthcoming. Before beginning data collection, local paramedics from clinics in the rural districts received extensive training and underwent a series of standardization exercises to document intra- and interobserver reliability for a number of fetal ultrasound measurements. The objective of this paper is to document the training process and to present the results of the standardization exercises.

Methods

Setting and study design

Two paramedics from each of four health centers in rural Bangladesh that participated in the randomized intervention study (the MINIMat Study), and one additional paramedic to serve as replacement during sickness and holiday leave, were trained to become ultrasonographers. The paramedics provide routine prenatal care at the rural clinics, and had worked in the clinics for a number of years before this study began. A medical doctor (referred to as the study supervisor) hired to supervise the ultrasound component of the study was trained over a 6-month period at the Atomic Energy Commission Obstetric Hospital in Dhaka, under the leadership of one of the co-authors (R.H., referred to as the study trainer). The course consisted of general ultrasound theory, extensive training in specific technique, and hands-on experience in the clinic setting. The study supervisor participated actively in the training of the paramedics as a means of furthering her own training. The nine paramedics received a 6-week training course at the same institute in Dhaka. The five paramedics who were originally hired were trained during one 6-week period and the additional four were trained approximately 6 months later, using the same protocol. After the training in Dhaka, the paramedics returned to the study health clinics and conducted ultrasound examinations prior to commencement of data collection, under the supervision of the study supervisor. During this period the study trainer made regular visits to the health centers to ensure that examinations were being done correctly.

The MINIMat Study protocol was approved by the Ethical Review Committee of ICDDR,B in Bangladesh and the Research, Biosecurity and Ethics Commissions at the National Institute of Public Health in Mexico.

Data collection for the standardization exercise

A standardization exercise to document the accuracy and precision of the ultrasound data was conducted in June 2001 for the study supervisor, in April 2002 for the first five paramedics and in July 2002 for the remaining four. Following each exercise, additional training was carried out where it was deemed necessary. Three paramedics for whom larger errors were detected underwent an additional restandardization exercise following retraining.

We assumed that the accuracy and precision of ultrasound measures may depend on fetal size. Therefore 20 women in the first trimester of pregnancy and 10 in the second or early third trimester were recruited for each standardization exercise. During the first trimester, 10 women were included for whom crown–rump length (CRL) could be measured and 10 women for whom measurement of biparietal diameter (BPD), occipitofrontal diameter (OFD), femur diaphysis length (FL), head and abdominal circumference (HC and AC, respectively) was feasible. Women in the second or early third trimester underwent measurement of the latter group of fetal parameters (i.e. all except CRL). As part of the intervention trial, women were to be measured at approximately < 12 weeks, 14, 19 and 30 weeks' gestation. For each standardization exercise we prioritized women who were close to these specific weeks of pregnancy.

After receiving a full explanation of the objectives and procedures, pregnant women were asked to give oral consent. Clinic staff not trained to perform ultrasonography or participating in the standardization exercise noted the duration of pregnancy based on maternal recalled date of last menstrual period (LMP), and based on this assessment the coauthor in charge (Y.W.) advised which measurements to take for each participant. The study trainer, supervisor and paramedics were kept blind to the recalled LMP until after completion of the ultrasound examination to ensure that this would not influence their measurements.

During each standardization exercise, each paramedic and the study supervisor conducted two ultrasound examinations for each participant under the supervision of one of the coauthors (Y.W.). The coauthor did not interfere or correct any measurement, but was present to ensure that the study protocol was followed. The other paramedics and the study supervisor and trainer were outside of the examination room when it was not their turn to measure. In all cases, the two examinations made by each individual were separated by no less than 20 min.

Fetal measurements were made using real-time ultrasound on a portable machine (Toshiba Portable System Model SSA328 Justavision-200, Toshiba, Tokyo, Japan) equipped with a 3.5-MHz transducer with a sound velocity of 1540 m/s. Scanning continued until an adequate image was found, that image was frozen and measurements were made using the machine's electronic calipers. Repeat measurements were taken from separate scans: three measurements within 3 mm for BPD, OFD and FL, three measurements within 8 mm for HC and AC. If one of the measurements was outside these limits, a fourth measurement was taken from a fourth scan. The mean of the three measurements (excluding the anomalous if a fourth was taken) was then used in the analysis. This protocol was followed because it is similar to what would eventually be used in the intervention trial. During the standardization exercise, the paramedics read the value from the screen and the researcher in charge of the exercise noted it on a data collection form. This was done to reduce the possibility of the paramedics remembering, and being influenced by, their own prior measurements.

For pregnancies at or before 8 weeks of gestation, CRL was measured. At all other gestational ages, BPD, OFD, HC, AC and FL were measured. Diameters and lengths were assessed using electronic calipers, and circumferences were estimated by electronic ellipse fitting. Biparietal diameter was measured from the outer edge of the proximal parietal bone to the inner edge of the distal parietal bone12. FL was measured according to the method suggested by O'Brien and Queenan13. Head and abdominal circumferences were measured using a standard technique12.

Statistical analysis

Precision was estimated as the difference in repeat measurements (mean of the first set of three measurements compared to the mean of the second set of three) taken on the same participant by the same observer (paramedic, supervisor or trainer), referred to as intraobserver error, and by the variation in measurements taken on the same participant by the different paramedics (interobserver error). The magnitude of intra- and interobserver error was estimated separately by trimester for each fetal biometric parameter using components of variance analysis. The pregnant women were treated as fixed effects and the paramedics as random effects. This was done using restricted maximum likelihood (REML) with the Proc VARCOMP command in SAS V9.1 (SAS Institute Inc., Cary, NC, USA). Components of variance analysis provides an estimate on average of the proportion of the total variance that can be attributed to the error in repeated measures by the same observer (intraobserver error) and that which can be attributed to the difference between observers (interobserver error). We report these as point estimates of the mean SD due to each component in mm, the variability around that mean as the square roots of asymptotic SDs of the mean error variance and as the percent of total variance attributable to intraobserver error for each measurement. All the available data were used in the calculations, including those obtained from the secondary restandardization exercises.

Accuracy was estimated as the difference in the mean of the two measurements taken by each paramedic compared to the study supervisor and by the study supervisor compared to the study trainer. The magnitude of the difference was assessed using paired t-tests, collapsing the data over all of the gestational ages measured. For this analysis, the mean of the two measures from each paramedic was calculated as a way to remove intraobserver error, and subtracted from the mean of the measurements made by the study supervisor. A similar comparison was made for the difference between the study supervisor and the study trainer. For the three paramedics who underwent additional training, we present their values for the standardization exercise conducted after the original training and that conducted after the additional training separately.

Finally, we used Bland–Altman14 plots to provide a visual assessment of whether the difference between the repeat measurements taken by the same observer (intraobserver error) is dependent on fetal size15. We plotted the intraobserver difference between the two measurements taken by each observer for each participant against the average of the two, as well as a line representing mean difference and ± 1.96 SDs (referred to as the 95% limits of agreement)14.

Results

A total of 180 women (30 in each of three standardization and three restandardization exercises) with singleton pregnancy and a reliable (according to maternal assessment) date of LMP participated in the standardization exercises. The duration of pregnancy at the time of examination according to LMP ranged from 7 to 31 weeks. All the women were from the same geographic area in which the intervention trial took place and many eventually became study participants. None of the measurements taken as part of the standardization exercises, however, was used for the larger study. After an initial training and standardization, three of the nine paramedics underwent further training and a second standardization exercise because of large error for some measurements. To avoid biasing the estimates of error in either direction, we used the data from all exercises here, including the restandardizations. For example 340 CRL measurements from 50 of the participating women were available. For the standardization exercise performed between the supervisor and the trainer, each of 10 participants was measured twice for a total of, for example, 40 CRL measurements.

Variance components analysis showed that intraobserver error (difference in repeat measures taken by the same paramedic) was larger than interobserver error (difference in measurements taken by different paramedics) for all parameters, with the exception of OFD during the first trimester (Table 1). Intraobserver error ranged from 0.97 mm for BPD in the first trimester to 7.25 mm for AC in the third trimester, and interobserver error ranged from 0.00 mm for FL in the third trimester to 3.36 mm for AC in the third trimester. Intraobserver error was larger for measurements taken for women in the later trimesters of pregnancy. Although this tendency can also be observed in interobserver error, the magnitude of the difference is much smaller. As would be expected given the measurement protocol, mean intra- and interobserver errors were within the limits established a priori for this study as permissive error for all fetal biometric parameters during each trimester. The proportion of total error attributable to differences in the repeat measurements taken by the same paramedic (intraobserver) was high for all measures (except OFD in the first trimester), and tended to increase by trimester.

Table 1. Precision, as determined by variance components analysis, of sonographic measurements of fetal biometric parameters by paramedics from four health clinics in rural Bangladesh
Ultrasound measurementIntraobserver error* (mm)Interobserver error* (mm)Pre-established permissible error (mm)Proportion of variance attributable to intraobserver error (%)
  • *

    Values represent the mean error standard deviation (in mm) attributable to differences in repeat measurements made by the same paramedic on the same participant (intraobserver) and by different paramedics on the same participant (interobserver). Values in brackets are the square roots of asymptotic standard errors of the mean error variance (in mm).

  • Estimated for this study as limit for acceptable error, based on fetal size and standard deviation.

  • Range of gestational age at time of first-trimester measurement was 11–13 weeks, except for crown–rump length, which was 8–12 weeks.

  • §

    Range of gestational age at time of second-trimester measurement was 14–25 weeks.

  • Range of gestational age at time of third-trimester measurement was 27–37 weeks.

First trimester
 Crown–rump length1.77 (0.63)0.83 (0.69)382
 Biparietal diameter0.97 (0.45)0.43 (0.44)384
 Occipitofrontal diameter1.37 (0.64)1.76 (1.47)338
 Head circumference2.95 (1.39)2.28 (2.26)663
 Femur diaphysis length0.99 (0.46)0.60 (0.61)373
 Abdominal circumference2.60 (1.20)1.29 (1.32)680
Second trimester§
 Biparietal diameter1.26 (0.46)0.43 (0.40)389
 Occipitofrontal diameter2.31 (0.85)0.58 (0.65)394
 Head circumference4.07 (1.48)2.06 (1.70)680
 Femur diaphysis length1.14 (0.41)0.91 (0.70)361
 Abdominal circumference5.25 (1.91)1.52 (1.49)692
Third trimester
 Biparietal diameter1.34 (0.49)1.00 (0.76)465
 Occipitofrontal diameter2.11 (0.76)0.57 (0.56)493
 Head circumference4.20 (1.53)1.47 (1.33)889
 Femur diaphysis length1.58 (0.57)0.004100
 Abdominal circumference7.25 (2.63)3.36 (2.74)883

Intraobserver error for the study supervisor and trainer and the interobserver error between them are shown in Table 2. In general the magnitude of error was smaller than that reported for the paramedics, with the exceptions of intraobserver error for HC and AC in the third trimester (6.44 mm and 14.69 mm, respectively). These high values are due to one very large difference between the repeat measurements in the case of HC and two very large differences in the case of AC in the measurements taken by the study supervisor. Removing these extreme values resulted in intraobserver error estimates comparable to those reported for the paramedics (data not shown). For the study supervisor and trainer, the error attributable to intraobserver differences represented 100% of the total error for most measurements and for all in the third trimester.

Table 2. Precision, as determined by variance components analysis, of sonographic measurements of fetal biometric parameters by the study supervisor as compared to the study trainer
Ultrasound measurementIntraobserver error* (mm)Interobserver error* (mm)Proportion of variance attributable to intraobserver error (%)
  • *

    Values represent the mean error standard deviation (in mm) attributable to differences in repeat measurements made by the study supervisor on the same participant (intraobserver) and between the study supervisor and the study trainer (interobserver).

  • Range of gestational age at time of first trimester measurement was 8–11 weeks.

  • Range of gestational age at time of second trimester measurement was 12–26 weeks.

  • §

    Range of gestational age at time of third trimester measurement was 27–38 weeks.

First trimester
 Crown–rump length1.470.00100
Second trimester
 Biparietal diameter1.070.00100
 Occipitofrontal diameter1.710.00100
 Head circumference3.680.9594
 Femur diaphysis length1.300.6879
 Abdominal circumference4.661.6189
Third trimester§
 Biparietal diameter1.430.00100
 Occipitofrontal diameter3.370.00100
 Head circumference6.440.00100
 Femur diaphysis length1.510.00100
 Abdominal circumference14.690.00100

The magnitude of error for each individual paramedic as compared to the study supervisor (Table 3) and for the study supervisor as compared to the study trainer (accuracy assuming the trainer as gold-standard compared to supervisor and supervisor as gold-standard compared to paramedics) was estimated by testing the paired differences between them. There were a number of statistically significant differences between the paramedics and the supervisor, but the magnitude was within the pre-established limits mentioned in Table 1 in most cases. For paramedics who underwent additional training and a second standardization exercise (comparison between the first and second rows for paramedics 1, 8 and 9 in Table 3), smaller errors were found during the second exercise for most of the parameters. The relatively even distribution of positive and negative numbers in Table 3 indicates that there was no systematic under- or overestimation of fetal biometric parameters by the paramedics in comparison with the study supervisor. There were no statistically significant differences in paired comparisons between the study supervisor and the study trainer.

Table 3. Accuracy* of the sonographic measurements obtained for each parameter by each paramedic and by the study supervisor
Paramedic numberCRLBPDOFDHCFLAC
  • *

    Mean paired differences (in mm) between each paramedic and the study supervisor, and between the study supervisor and the study trainer. Differences calculated as mean of two measurements taken by paramedic minus the mean of the two measurements taken by the study supervisor, or, in the final row, as mean of two measurements taken by the study supervisor minus the mean of the two measurements taken by the study trainer. Measurements from all trimesters of pregnancy were analyzed together.

  • For three paramedics (No. 1, 8 and 9) retraining and restandardization was conducted owing to the large error for some measurements during the first exercise; the data in these rows represent differences during the second standardization exercise.

  • Statistically significant differences between the paramedics' and the supervisor's results (P < 0.05, paired t-test). AC, abdominal circumference; BPD, biparietal diameter; CRL, crown–rump length; FL, femur diaphysis length; HC, head circumference; OFD, occipitofrontal diameter.

11.9− 0.10.8− 1.3− 0.7− 5.8
10.4− 0.30.9− 2.0− 0.20.1
2− 1.2− 0.8− 2.0− 5.7− 0.1− 2.1
31.11.3− 0.5− 1.9− 0.2− 4.1
40.9− 0.7− 1.8− 6.30.3− 2.8
5− 0.4− 0.9− 2.0− 5.5− 0.1− 2.3
60.50.3− 0.1− 0.40.2− 1.3
70.60.1− 1.1− 0.7− 0.1− 2.0
81.90.3− 1.0− 2.21.0− 0.3
80.80.4− 0.60.30.60.7
90.50.6− 1.1− 3.4− 0.6− 5.1
9− 0.2− 0.10.1− 1.9− 0.3− 1.8
Supervisor0.2− 0.13.52.0− 0.3− 0.6

The Bland–Altman plots presented in Figure 1 confirm that there is a tendency towards greater variability between repeat measurements taken by the same observer when fetal size is larger. This applies to all parameters measured except CRL, where the opposite tendency was observed, i.e. larger variability with smaller CRL (Figure 1a).

Figure 1.

Bland–Altman plots14 for the comparison of repeat measurements of (a) crown–rump length, (b) biparietal diameter, (c) occipitofrontal diameter, (d) head circumference, (e) femur diaphysis length and (f) abdominal circumference, made by the same observer on the same study participant. Each point represents one pair of repeat measurements. (equation image), mean of difference between first and second measurement; (equation image), 95% limits of agreement.

Discussion

Measurement error has important implications for clinical practice, where it may result in inaccurate estimation of gestational age6 and inappropriate diagnosis related to fetal growth16. When fetal size or growth measured by ultrasound is used as an outcome in intervention studies, the magnitude of the expected impact may be very small. Thus, measurement error must be minimized and the magnitude of expected error known and adequately documented. A number of studies have reported the effect of different technical problems affecting ultrasound measurements17, 18 and the accuracy of late pregnancy ultrasound measurements compared to postnatal size17. Few published studies, however, have provided quantitative estimates of the accuracy and precision of measurements at differing gestational ages. Some studies designed to document the reliability of repeat measurements have not provided a quantitative estimation of the magnitude of error attributable to intra- and interobserver measurement error6, or have presented error as a percentage of measured size19 and not as absolute values. Although this is useful for documenting the quality of ultrasound measurements, we were also interested in the absolute size of the errors, as this has important implications for our ability to detect a difference in growth or size related to a specific intervention.

We found that both intra- and interobserver measurement errors from this study were similar or even somewhat smaller than the errors reported in the literature. Contrary to previous assessments7, in our study intraobserver error was consistently larger than interobserver error. This may be related to the inexperience of the paramedics at the time of the standardization exercises, but this seems unlikely given that the same pattern was observed for the study supervisor and trainer. Others have found interobserver variability in estimates of head and abdominal circumferences related to differences in images, the method used to identify the images, measurement techniques and the type of machine itself19. In the present study all the paramedics were trained to use the same protocol and strict supervision was implemented during training to ensure that standard techniques were acquired and followed. The relatively small interobserver error may be related to consistency in the measurement technique used among the paramedics.

The errors reported here may be smaller than would be expected in a less controlled clinical setting. First, we included few measurements after 30 weeks of gestation, to reflect the larger study protocol. We do not know whether our finding of acceptable error could be extrapolated to later gestational ages. Second, our intensive training and retraining protocols and the error limit set, permitting a fourth measurement when one of the first three was very different, will have reduced error compared to a less controlled setting. Although this may be difficult to implement in a clinic setting, for research this type of protocol is standard procedure and key if small effects of interventions are to be detected.

We did not assess whether maternal characteristics may have influenced error. In a previous study, maternal abdominal wall thickness and parity were found to be associated with lower accuracy and precision7. In that same study, gestational age at the time of measurement was also associated with larger errors. Consistent with that report, we found progressively greater intraobserver error from the first to the third trimester. Similarly, intraobserver error was greater in larger fetuses for all biometric parameters except CRL. It is important to note that, although the Bland–Altman plots shown in Figure 1 provide a clear visual demonstration of this tendency, the mean difference estimated from them does not adequately quantify intraobserver error of estimation. In our study, interobserver error was relatively consistent across the trimesters of pregnancy, with the exception of AC during the third trimester. Again, this may be related to the strict measurement protocol followed by all the paramedics.

This standardization exercise was conducted as part of a large intervention trial to determine the impact of nutritional supplementation during early vs. later pregnancy on fetal growth and a number of other outcomes (results forthcoming). The expected effect size for fetal growth measured by ultrasound was small, and it was important to clearly document our ability to detect this difference by quantifying error associated with the measurements. We hypothesized that we would find a 5% difference in some ultrasound measurements between the supplementation groups by mid-gestation. At 30 weeks' gestation, intraobserver error in our sample was approximately 1.8% and interobserver error approximately 1.3% (assuming a mean BPD of 76.0 ± 3.4 mm in this sample). Thus we conclude that, given our ultrasound fetal biometry protocol, it is reasonable to assume that we have the technical ability to detect the hypothesized difference between groups in the intervention trial.

The study reported here had a unique design whereby nine local paramedics from four health clinics in rural Bangladesh, with no prior exposure to ultrasonography, were trained to conduct ultrasound examinations for fetal biometry during prenatal visits related to a prenatal intervention trial. In this study, hiring medical doctors would have been unsustainable owing to cost and distance to travel to the clinics over a multi-year project. At the same time, local field staff felt that women would be much more responsive to ultrasound examinations conducted by local paramedics, from whom they routinely receive prenatal care. We have shown that with relatively short-term intensive training (6 weeks) plus ongoing supervision and reinforcement, paramedics can conduct accurate and reliable measurements of fetal biometry during all trimesters of pregnancy.

Acknowledgements

We are grateful to the paramedics at the health clinics for their dedicated work during the intensive training and standardization exercises and to the pregnant women who patiently underwent multiple ultrasound examinations. This work was done as part of the MINIMat Study. The MINIMat research study was funded by the United Nations Children's Fund (UNICEF), Swedish International Development Cooperation Agency (SIDA), UK Medical Research Council, Swedish Research Council, Department for International Development, UK (DFID), International Centre for Diarrhoeal Disease Research, Bangladesh (ICDDR,B), Child Health and Nutrition Research Initiative (CHNRI), Uppsala University and United States Agency for International Development (USAID). The ultrasound component of the MINIMat Study was also supported by the Grant-in-Aid for Scientific Research (A) No. 07013200 of the Japan Society for the Promotion of Science (JSPS). ICDDR,B acknowledges with gratitude the commitment of these donors to the center's research efforts. ICDDR,B also gratefully acknowledges these donors who provide unrestricted support to the center's research efforts: Australian International Development Agency (AusAID), Government of Bangladesh, Canadian International Development Agency (CIDA), Government of Japan, Government of the Netherlands, Swedish International Development Cooperation Agency (SIDA), Swiss Development Cooperation (SDC) and DFID. We gratefully acknowledge the participation of all the pregnant women and their families in Matlab.

Ancillary