- Top of page
- Supporting Information
The increasing impact of legal concerns and the pressure for cost-effective decisions have produced a need to implement effective quality-control systems for sonographic examinations in prenatal medicine1. Ultrasound fetal biometry is the most frequently used technique for the assessment of fetal growth and weight estimation2. Insufficient quality of ultrasound examinations, including high intra- and interobserver variability3, has a strong impact on the accuracy of fetal measurements4. Several methods for the quality control of fetal biometry have been proposed, the majority of which, however, are not practicable in routine clinical practice. The effectiveness of guidance by an experienced reviewer is limited by time restrictions, subjectivity and randomly occurring errors5, 6. Quality assurance using image-scoring methods does not provide a longitudinal evaluation of the process. Furthermore, executing an external audit may result in the selection of only the best ultrasound images obtained.
In the past, the accuracy of sonographic fetal weight estimation has been assessed by calculating the mean signed percentage error (PE) and its absolute value (APE)7. The disadvantage of these mathematical calculations, however, is that the temporal dynamic of the accuracy of fetal weight estimation and the experience of the examiner are not included in the process of the evaluation.
Salomon et al.8 suggested the use of the standardized z-score system to estimate the accuracy of sonographic fetal biometry. If fetal measurements are accurate, the z-score distribution will follow a standardized normal distribution1. In an optimal situation, measurements lying on the 50th percentile should correspond to the theoretical 50th percentile, and the resulting z-score should be equal to 0, whereas the z-scores of the 5th and 95th percentiles should be close to ± 1.645 in a non-selected population. Nevertheless, although the z-scores method compares the distribution of measured parameters with reference values adjusted for the population at the specific institution concerned, it also lacks the capability to identify the point in time when systematic errors of measurement occur.
Cumulative summation (CUSUM) charts are an established method for quality control in a number of different fields of medicine9–12. We employed CUSUM, a statistical tool that graphically presents the outcomes of any consecutive procedures13, to estimate the putative factors diminishing the accuracy of ultrasound fetal biometry and to assess the competence of a number of examiners over a certain period of time, focusing on systematic and random errors14.
A combination of the CUSUM technique with the z-scores system allows an objective evaluation of fetal biometry measurements. We hypothesized that when CUSUM curves of fetal biometry determinants identify systematic errors of single parameters, similar errors should appear in the CUSUM curve of fetal weight assessment.
The use of the CUSUM method to analyze the learning curve of trainees in the estimation of sonographic fetal weight within the last week prior to delivery could be a valuable tool in the evaluation of their competence, as the actual birth weight is soon available for comparison. Furthermore, a combination of the CUSUM technique with the z-scores system could be used to monitor the quality of measurements of individual fetal biometry parameters and so allow the identification of the causes of inaccurate sonographic fetal weight estimations.
- Top of page
- Supporting Information
Following institutional review board approval from the Committee on Human Research at the University of Zurich, fetal biometry data from a total of 1298 routine ultrasound scans performed by three trainees (Examiners 1–3) at the beginning of their ultrasound training and one experienced examiner (Examiner 4) with 10 years' experience of daily ultrasound scanning at the University Hospital of Zurich during 2004–2007 were analyzed retrospectively. All the results from each examiner were recorded sequentially. Data included sonographic estimated fetal weight (EFW) of all live-born, singleton term (≥ 37 weeks' gestation) deliveries obtained within the last week before delivery. Infants with congenital malformations were excluded from the study.
Fetal weight was estimated according to the three-parameter formula of Hadlock et al., which includes head circumference (HC), abdominal circumference (AC) and femur length (FL), or their two-parameter formula including AC and FL if the fetal head parameters biparietal diameter (BPD) and occipitofrontal diameter (OFD) could not be accurately obtained15. Fetal measurements were all made in the planes described by Campbell et al.16, 17. The BPD was measured from the outer margins of the skull (outer–outer), which is accepted as the standard method in Germany and Switzerland. HC was calculated from linear measurements of BPD and OFD, and AC from abdominal transverse diameter and anteroposterior abdominal diameter using the ellipse formula. All measurements were obtained with Sonoline Prima and Allegra ultrasound machines (Siemens Medical Systems Inc., Malvern, PA, USA) using a 3.5-MHz transducer. Real birth weight was determined as a reference standard for assessment of the accuracy of sonographic fetal weight estimation, with the APE and PE calculated for each case independently.
The learning curve (LC-CUSUM) chart for each examiner was generated using the APE according to the method described by Bolsin and Colson13, in which the null hypothesis states that the process is out of control and the alternative hypothesis determines the process to be in control18. The main principle of CUSUM is that each procedure is assigned a score of which the size and polarity depend on the chosen standard and actual outcome, respectively19. Each score is sequentially added to the cumulative score and plotted graphically. As a result, an increment of the graph reports a failure and shows the process being out of control (with significance reached at the upper boundary line), whereas a decrement demonstrates a success and achieved competence (with significance reached at the lower boundary line). When the CUSUM chart oscillates and remains between the two boundary lines, no statistical evaluation can be made, indicating that more observations are necessary20. In our analysis an examination was considered to be a failure when there was an absolute error in birth weight estimation of ≥ 15%, the acceptable failure rate was set at 5% and the unacceptable failure rate at 15%.
For an evaluation of the systematic error of fetal weight estimation double CUSUM charts based on the PE were generated, with positive and negative errors summed separately, but presented on the same chart, for each examiner. The upper CUSUM detects increases in the positive failure rate (overestimation) while the lower CUSUM detects increases in the negative failure rate (underestimation) of the birth weight estimation.
To assess the distribution of individual fetal measurements in comparison with the normal distribution of the reference values, HC, AC and FL were transformed into z-scores21, 22. For this analysis, data from the 227 ultrasound scans performed by Examiner 2 were used. The z-scores were calculated according to the following formula:
z-score = (X − MGA)/SDGA, where X is the measured value, MGA is the mean value for the appropriate gestational age and SDGA is the standard deviation associated with the mean value at that gestational age21.
Subsequently, double CUSUM charts were constructed using the z-scores to assess the accuracy of the examiner in measuring each parameter of fetal biometry. Fetal parameters exhibit a certain amount of variation around the reference value owing to a natural variability and the individual manner of measuring. The standardized differences between the measured and expected values should approximately follow a normal distribution, with the mean value = 0 and the SD = 118. If a tendency of over- or underestimation occurs and the standardized differences are greater than the tolerated level (here defined as 0.5), the CUSUM curve will deviate markedly from zero.
A control-limit violation occurs when either the positive or negative CUSUM curve exceeds a specified control limit. In our study the upper and lower control limits for double CUSUM charts were defined as three SDs from the expected value. This selection implies that the difference in expected and measured values lies within the control chart limits in 99.73% of all individual examinations. A more detailed description of the CUSUM techniques used is presented in the online supplement (Appendix S1). A one sample t-test was applied to compare the mean of each z-score distribution with the theoretically expected value.
- Top of page
- Supporting Information
Between 182 and 622 sonographic fetal weight estimations were performed at term by Examiners 1–4. The mean APE, PE, interval between ultrasound examination and delivery and infant birth weight for each examiner are given in Table 1.
Table 1. Characteristics of the study population and errors in birth weight estimation for each of the examiners
|Variable||1 (n = 226)||2 (n = 227)||3 (n = 182)||4 (n = 622)|
|APE (%)||7.1 ± 5.2||7.1 ± 5.6||8.3 ± 5.6||5.9 ± 4.6|
|PE (%)||− 2.2 ± 8.6||− 3.0 ± 8.6||− 4.5 ± 9.0||− 1.6 ± 7.3|
|Ultrasound scan before delivery (days)||0.9 ± 1.25||0.9 ± 1.15||0.8 ± 0.86||2.0 ± 1.54|
|Birth weight (g)||3365 ± 483||3375 ± 454||3397 ± 418||3466 ± 536|
The LC-CUSUM curves were appropriate and demonstrated the ongoing failures and successes when an unacceptable failure rate was defined as 15% (Figure 1a). Examiner 1 estimated fetal weight rather efficiently and consistently from the very beginning, with isolated inaccurate measurements. The acceptable level of accuracy had been reached after 20 scans, when the gap between two boundary lines was crossed. From the 36th to the 70th attempt the plot showed an upward trend, although not reaching an unacceptable boundary line. After this the performance improved , with the plot crossing the acceptable boundary line from above at the 107th scan and staying almost at the same level until the end of the observation period, with short periods of reduced accuracy.
The graphical presentation of the performance of Examiner 2 describes a typical learning curve. The graph shows an upward trend revealing consistent errors until the 132nd scan. Thereafter, the plot spanned the gap between two boundary lines downwards and revealed achieved competence at the 166th scan. The accuracy of Examiner 3 was very limited during the first 151 scans as the slope was constantly rising. Beginning with the 177th attempt, the LC-CUSUM plot crossed in a downward direction the gap between two boundary lines, revealing that the examiner had become proficient. The LC-CUSUM graph demonstrating sonographic fetal weight estimation by Examiner 4 is presented in Figure 1b. The graph constantly moved downwards, revealing a high degree of competence in fetal weight estimation in the frame of our predefined standards.
Figure 2a shows the double CUSUM chart based on the PE presenting persistent accuracy of fetal weight estimation performed by Examiner 1. The process was rather precise, with several deviations indicating a negative systematic error.
Figure 2. Double cumulative summation (CUSUM) charts demonstrating the accuracy of sonographic fetal weight estimation, based on percentage error for Examiner 1 (a), Examiner 2 (b), Examiner 3 (c) and Examiner 4 (d). LCL, lower control limit; UCL, upper control limit.
Download figure to PowerPoint
The performance of Examiner 2 varied less from the center line compared with Examiner 1. However, after the 180th scan a negative systematic error occurred, leading to an obvious underestimation of fetal weight (Figure 2b).
Figure 2c demonstrates a distinct underestimation of fetal weight by Examiner 3. Starting with the 7th attempt, an alarm signal was triggered, indicating a lack of competence. During most of the study period this examiner had been underestimating fetal weight, as the cumulative sum was greatly negative. An improvement in accuracy could be suspected only at the very end of the study period.
The accuracy of sonographic fetal weight estimation by Examiner 4 is presented in Figure 2d. Maximal accuracy is found after the 190th scan. The CUSUM charts and the center line converge as the cumulative score sum is low enough.
To analyze the impact of single determinants of fetal weight estimation, the fetal biometry data from 227 examinations performed by Examiner 2 were converted into z-scores and agreement with standard 5th, 50th and 95th percentiles was analyzed (Table 2). The accuracy of HC measurements seemed to be good as they were consistent with the normal Gaussian distribution and did not differ significantly from the expected standards of z-scores. In contrast, the z-scores for AC and FL were significantly different from the expected mean of zero, indicating an unsatisfactory performance. Correspondingly, the double CUSUM chart showed that the consecutive measurements of HC were mostly within the control limits during the study period, showing only two transient negative and positive deviations (Figure 3a). In contrast, the double CUSUM chart presenting the ongoing quality of AC measurements (Figure 3b) revealed poor accuracy, with a strong tendency towards overestimation between scans 61 and 118 and 135 and 183. Solely between scans 183 and 217 did the chart oscillate between the acceptable boundary lines. During the final scans AC was again systematically overestimated. The accuracy of fetal FL measurements corresponded with the predefined standards until scan 154. Subsequently, a systematic negative error occurred and persisted until the end of the study period (Figure 3c). This underestimation corresponded closely to the fetal weight underestimation shown in Figure 2b.
Figure 3. Double cumulative summation (CUSUM) charts demonstrating the accuracy of measurements of fetal head circumference (a), fetal abdominal circumference (b) and fetal femur length (c) for Examiner 2. UCL, upper control limit; LCL, lower control limit.
Download figure to PowerPoint
Table 2. Characteristics of the z-score distributions of ultrasound biometry measurements (n = 227) performed by Examiner 2
| ||Percentile|| |
|Measurement||5th||50th||95th||Mean ± SD||P*|
|Head circumference||− 1.7||− 0.1||1.5||− 0.027 ± 1.069||0.7066|
|Abdominal circumference||− 1.4||0.3||2.2||0.344 ± 1.142||< 0.0001|
|Femur length||− 1.8||− 0.4||1.3||− 0.350 ± 0.944||< 0.0001|
- Top of page
- Supporting Information
The evaluation of competence and quality of measurements in sonographic fetal biometry has recently become of increasing interest in prenatal medicine. The rationale for performing monitoring is to improve the performance by detecting errors and implementing adequate corrective measures18.
The CUSUM technique appears to be a promising method for application in routine clinical practice. The accumulation of recent ultrasound results allows the detection of even small permanent errors, which otherwise are easily missed23, and the appropriate corrective measures can be implemented immediately. Sonographic fetal weight estimation is an optimal variable for evaluation by the examiner because no subjective bias is expected, as the actual weight of an infant is known only after birth. To rule out a selection bias in the analyzed population, CUSUM curves of single fetal biometry determinants were compared with CUSUM curves of weight estimation of the same fetuses.
When applying the CUSUM method, the target and the properties should be defined prospectively18. Usually unacceptable failure rates are set to be less stringent at the beginning of the learning process, and once these initial rates have been achieved the CUSUM chart might be recalculated according to stricter standards20. The same principle could be used for setting control limits.
At present, in different medical fields, approval of competence is based on a certain number of procedures that the physician has to perform10. Our findings, however, show that, at least for sonographic fetal weight estimation, the number of procedures required for proficiency to be achieved varies greatly between individuals (Figure 1). It has been recommended that trainees should complete 200 scans over a period of 3 years to achieve competence24. According to protocols of the German Medical Society of Ultrasound25, the examiner has to perform more than 300 ultrasound scans to prove sufficient competence.
Predanic et al.26 have shown that the level of accuracy in fetal weight estimation increases with the number of examinations. Nevertheless, these investigators could not identify the exact point in time at which the improvement occurred or at what time corrective action should have been started26. Thus CUSUM, being a longitudinal time-weighted control chart, provides prompt information when the process is getting out of control and also allows continuing quality control for trained physicians, especially when new equipment or a different anatomical approach is implemented27.
The LC-CUSUM chart presents the dynamic of the errors of every single examiner (Figure 1). It is sufficient to use the LC-CUSUM to observe the time when sonographic quality improves and proficiency has been achieved. The double CUSUM chart, on the other hand, also allows detection of systematic errors of fetal weight estimation in contrast to aggregate methods, where poor runs can be compensated and hidden by the existence of excellent results at other time periods23.
Measurements of individual fetal biometric parameters can be assessed to identify the source of the errors leading to inaccurate fetal weight estimation. We therefore suggest evaluation of the competence of clinicians longitudinally using the z-scores of fetal HC, AC and FL analyzed by the CUSUM technique. This approach focuses on the direct cause and exact time point of continuously occurring systematic errors of sonographic fetal weight estimation.
Salomon et al.8 demonstrated the power of the z-scores system for evaluating the quality of sonographic fetal biometry. This system provides a valuable tool for identifying insufficient performance as shown by the abnormal distribution of z-scores of fetal biometric parameters. Indeed, fetuses with measurements located at the extremes of the normal distribution curve are considered to be at higher risk, and wrongly shifted distributions may lead to ineffective screening and even inappropriate action being taken in these pregnancies. Although this method allows statistical evaluation of the overall process, the disadvantage is that it is not able to detect the specific time points at which systematic errors occur. The identification of the exact time point, however, would be of great practical value as it would allow the direct implementation of corrective actions—or at least for a decision to be made regarding whether to continue or modify the monitored process18.
The results of Salomon et al.8 correspond with our findings and show that owing to the overestimation of fetal AC and underestimation of FL many fetuses are allocated as false-negatives with respect to their birth weight. We agree that it is essential to correctly identify fetuses with borderline biometric measurements, as these findings are often associated with intrauterine growth restriction or macrosomia8. However, correct measurements are difficult to achieve without an exact failure analysis of separate biometric parameters.
Our retrospective analysis of one examiner showed that several periods of fetal AC overestimation did not have a major influence on fetal weight measurement. In contrast, systematic underestimation of FL starting with the 154th attempt (Figure 3c) was associated with a large negative impact on fetal weight estimation. In fact, even the short periods of AC and HC overestimation during the final study period (Figure 3a,b) did not balance this until the end of the study. This finding can be explained by the original formulae of Hadlock et al.28, which include regression coefficients for FL that are greater than those for the other fetal parameters included.
The number of measurements analyzed in this study may be insufficient for an exact evaluation of the impact of single determinants on the accuracy of weight estimation; nevertheless, we aimed to demonstrate a trend based on systematic errors in the measurement of single fetal biometric parameters.
A prospective study would be able to evaluate the value of the CUSUM technique as a continuous audit system, allowing urgent real time feedback to improve the quality of fetal ultrasound biometry of beginners as well as of experts in ultrasound diagnostics. Furthermore, the CUSUM technique may be valuable for other sonographic assessments in perinatal medicine.
SUPPORTING INFORMATION ON THE INTERNET
The following supporting information may be found in the online version of this article:
Appendix S1 Derivation of the main statistical formulae used for construction of the CUSUM curves, including their predefined boundary lines.