- Top of page
- Supporting Information
Accurate diagnosis of adnexal masses is crucial for their adequate management. Ultrasound is currently considered as the first-line imaging technique for discriminating between benign and malignant adnexal masses. However, this technique is highly dependent on the expertise of the examiner, with previous studies showing that reproducibility, diagnostic performance and examiner's confidence in providing a diagnosis are poorer in non-expert examiners[4-9].
Several scoring systems and logistic regression models have been developed in an attempt to overcome this limitation of ultrasound[10-12]. However, evidence shows that these approaches are not superior to an expert examiner and that they are not useful when the examiner is uncertain about the nature of the mass.
These facts highlight the relevance of adequate training for assessing adnexal masses by ultrasound. Several programs, including theoretical and practical aspects for ultrasound training in obstetrics and gynecology, have been proposed[15-18]. However, to the best of our knowledge, no specific training program for assessing adnexal masses by ultrasound has been previously proposed.
The use of simulators for basic gynecological ultrasound training has been proposed[19, 20]. However, training in ultrasound assessment of adnexal masses is not feasible using these simulators and real-time ultrasound training is needed. It is important to note, however, that real-time ultrasound training may be time-consuming, depending on the workflow of masses evaluated at a given institution or ultrasound laboratory.
Three-dimensional (3D) ultrasound has become increasingly available in clinical practice. This technique allows the acquisition and storage of 3D volumes of structures of interest that can subsequently be assessed offline, and this assessment is reproducible between observers. Indeed, we have shown that diagnostic accuracy for discriminating benign from malignant adnexal masses using offline assessment of 3D volumes is similar to that of real-time ultrasound.
We hypothesized that the combined use of real-time ultrasound and evaluation of offline 3D volumes could constitute adequate training for ultrasound assessment of adnexal masses over a relatively short time period. For this reason we have developed a specific training program incorporating these two methods. The aim of this study was to assess the feasibility of the program and its preliminary results.
- Top of page
- Supporting Information
The training program was developed at the Department of Obstetrics and Gynecology, Clinica Universidad de Navarra, University of Navarra, Pamplona, Spain. The program design was presented to the Department's Educational Committee who evaluated and approved the start of the program. Institutional Review Board approval was also obtained.
The objective of this program was to train examiners with no or very little experience in ultrasound assessment of adnexal masses. Given that expert examiners have shown a consistently high diagnostic performance in terms of sensitivity (90–98%) and specificity (80–94%)[23-28], the objective was that trainees should achieve a high diagnostic performance with sensitivity > 95% and specificity > 90%.
The program consisted of three phases (Figure 1). Phase 1 consisted of a 1-day (8 h) theoretical course addressing clinical and ultrasound issues related to adnexal masses. It included short lectures on the epidemiological aspects of ovarian cancer and adnexal masses, the surgical management of adnexal masses and ovarian cancer, the principles of gray-scale ultrasound and Doppler and the use of ultrasound for assessing adnexal masses, mainly focused on the use of pattern recognition (Appendices S1 and S2; and Volumes S1–S5).
Phase 2 consisted of 4 weeks of training in real-time ultrasound in our tertiary care center, in which 25–30 adnexal masses were evaluated per month. During this phase, training was performed under the direct mentorship and supervision of an expert examiner with more than 20 years' experience in gynecological ultrasound (J.L.A.). In this phase, the trainee was taught to perform transabdominal and transvaginal ultrasound, specifically pelvic ultrasound, including normal pelvic anatomy and uterine and ovarian biometry. Specifically, they were trained in the assessment of adnexal masses by gray-scale and Doppler ultrasound and to look for features specifically related to the dynamic aspects of real-time ultrasound, such as mass mobility or tenderness. They were also taught to apply machine settings in both gray-scale and Doppler imaging.
Phase 3 consisted of a half-day course with specific training on the use of dedicated software (4D View; GE Medical Systems, Zipf, Austria) for assessing 3D volumes, and a further 4 weeks for assessing stored 3D volumes. Trainees were provided with five sets of 100 3D volumes from adnexal masses, along with an Excel file (Microsoft Corporation, Redmond, WA, USA) containing clinical data from each patient (patient age, menopausal status and reported symptoms). The 3D volumes contained both gray-scale and power Doppler information.
All 500 adnexal masses were from patients evaluated and treated at our institution between January 2003 and December 2007. In 434 cases, a definitive histological diagnosis had been obtained after surgical removal of the tumor; in the remaining 66 cases, considered benign, the ovarian cyst was followed up until spontaneous resolution.
For each set of 3D volumes the trainee had to analyze the 3D volume and interpret the images along with the clinical data. Volume analysis was performed by virtual navigation in all three orthogonal planes through the mass and involved measuring structures such as tumor diameter, size of solid components, height of papillary projections, thickness of cyst wall and septations (if present); looking at vessel location, distribution and amount (absent, scanty, moderate or abundant); and analyzing echogenicity of cyst content, looking for acoustic shadowing and analyzing tumor contour (regular or irregular).
The trainee had to provide a diagnosis of benign or malignant based on pattern recognition and the level of confidence in their diagnosis (certainly benign, probably benign, uncertain, probably malignant or certainly malignant). In those cases for which the trainee was uncertain about their diagnosis, they had to provide a diagnosis as benign or malignant in spite of uncertainty.
After finishing each set, the trainee was provided with the final diagnosis and they had to compare their diagnosis with the final diagnosis to determine success or failure. Those cases with an incorrect diagnosis were then reviewed with the trainer in an attempt to identify the reasons for error in the diagnosis and to reinterpret the images.
As stated above, each set of 3D volumes contained 100 cases selected by the trainer. This selection was made according to a single criterion: the prevalence of malignant cases had to be similar in all five sets (set 1, 33%; set 2, 28%; set 3, 35%; set 4, 35%; and set 5, 32%) (Table 1). Videoclips were not used in this training program.
Table 1. Definitive diagnosis of adnexal masses included in each set of ultrasound volumes
|Final diagnosis||Set 1||Set 2||Set 3||Set 4||Set 5|
|Granulosa cell tumor||0||0||0||0||2|
The training program was offered to two trainees: an obstetrics and gynecology specialist with experience in transabdominal obstetrical ultrasound but very little experience in gynecological ultrasound (L.D., Trainee A); and a 3rd-year resident in radiology with no experience of gynecological ultrasound (P.F., Trainee B). Training took place between November 2011 and December 2011.
To analyze changes in each trainee's diagnostic performance, the sensitivity, specificity and likelihood ratios (LR) with 95% CI were calculated for each consecutive set of 3D volumes, before cases of incorrect diagnosis were reviewed with the trainer.
The learning curve cumulative summation (LC-CUSUM) test was used to assess each trainee's learning curve. Acceptable and unacceptable failure rates were set at 15% and 25%, respectively. These limits were chosen assuming that the pooled failure rate for an expert examiner could be around 15–25%, taking into account both false-positive and false-negative results[23-28]. Type I (α) and type II (β) error rates were set at 0.1. CUSUM values are plotted on the y-axis and the numbers of examinations are plotted on the x-axis. Horizontal lines are plotted at regular intervals on the y-axis, defining h0 and h1 for the spacing between acceptable and unacceptable boundary lines, respectively. The competence is declared when the plot falls below two consecutive boundary lines.
To check competence in a real-life setting, both trainees were asked to report the number of cases evaluated by each of them in their respective working places during the 10 months after training, as well as their diagnosis and final outcome (histology or spontaneous resolution). Sensitivity, specificity and LRs for this series were calculated.
- Top of page
- Supporting Information
Both trainees completed the program. Sensitivity, specificity and LRs after each set of 3D volumes are shown, for both trainees, in Table 2. LC-CUSUM plots showed that competence was declared after 170 cases for Trainee A and after 185 cases for Trainee B (Figure 2). Changes in trainees' confidence are shown in Figure 3, which demonstrates that confidence increased with training.
Table 2. Diagnostic performance for Trainees A and B following analysis of each set of three-dimensional ultrasound volumes of adnexal masses (100 volumes per set)
|Trainee|| || ||LR+||LR–|
| Set 1||74 (57–86)||83 (72–90)||4.38 (2.46–7.81)||0.31 (0.17–0.57)|
| Set 2||96 (79–99)||92 (84–96)||12.59 (5.81–27.31)||0.05 (0.01–0.32)|
| Set 3||97 (86–99)||95 (88–98)||21.73 (7.18–65.76)||0.03 (0.004–0.20)|
| Set 4||100 (90–100)||89 (79–95)||9.50 (4.45–20.25)||—|
| Set 5||100 (89–100)||92 (83–97)||12.60 (5.43–29.22)||—|
| Set 1||76 (58–88)||88 (78–94)||6.45 (3.26–12.76)||0.27 (0.14–0.52)|
| Set 2||96 (80–99)||94 (86–98)||17.01 (6.54–44.23)||0.04 (0.01–0.30)|
| Set 3||91 (76–97)||89 (80–95)||8.57 (4.22–17.41)||0.10 (0.03–0.30)|
| Set 4||97 (86–99)||84 (72–91)||6.05 (3.32–11.03)||0.03 (0.005–0.23)|
| Set 5||100 (89–100)||90 (79–95)||8.86 (4.41–17.79)||—|
Figure 2. Cumulative summation test for the learning curve (LC-CUSUM) graphs for discriminating benign from malignant masses for two trainees. Trainee A reached performance after Case 170. Trainee B reached performance after Case 185. Dotted horizontal lines show acceptable/unacceptable boundary lines of the CUSUM score (at intervals of 3.4).
Download figure to PowerPoint
After training, Trainee A evaluated 102 cases from January 2012 to October 2012, with a final outcome available for 89 masses (eight malignant and 81 benign). Sensitivity was 100% (95% CI, 59.8–100), specificity was 91.3% (95% CI, 82.4–96.2%) and the positive LR (LR+) was 11.6 (95% CI, 5.7–23.5) (the negative LR (LR–) could not be calculated). Trainee B evaluated 74 adnexal masses during the same period, with a final outcome available for 53 cases (seven malignant and 46 benign). Sensitivity was 100% (95% CI, 56–100%), specificity was 89.1% (95% CI, 75.6–95.9%) and the LR + was 9.2 (95% CI, 4.0–21.0) (the LR– could not be calculated).
- Top of page
- Supporting Information
Expertise is recognized as essential for achieving a good diagnostic performance in assessing adnexal masses by ultrasound. Van Holsbeke et al. reported on the effect of a theoretical course on trainees' competence for discriminating between benign and malignant adnexal masses. They concluded that theoretical ultrasound teaching did not improve the performance of pattern recognition in the hands of trainees and that practical training is likely to be of paramount importance if diagnostic performance is to be optimized. Furthermore, real-time ultrasound is better than two-dimensional (2D) static images for assessing adnexal masses.
In this study we have presented the development and testing of a training program specifically designed for training in ultrasound assessment of adnexal masses that includes a theoretical course as well as practical and offline training. The use of offline assessment of 3D volumes for teaching trainees to identify normal anatomy and structural anomalies has been previously proposed in obstetrics.
In the case of adnexal masses, according to LC-CUSUM plots, trainee competence can be achieved after 170 examinations. This is in agreement with the figures for diagnostic performance after the second set (i.e. 200 cases) of examinations (sensitivity and specificity of 96% and 92%, respectively, for Trainee A, and 96% and 94%, respectively, for Trainee B). We observed that good diagnostic performance was maintained in the three subsequent sets of 3D volumes. More importantly, when trainees returned to their practices to diagnose cases in a clinical setting, their diagnostic performance was high, indicating that this training program is effective.
Another advantage we believe that this program may offer is the short time required to achieve competence. This program was designed for an 8-week period, whereas training based solely on real-time ultrasound would need much longer to achieve the same level of competence. At our institution, in which 25–30 adnexal masses per month are evaluated, a trainee would need to spend at least 6–7 months assessing masses in order to assess the ˜170 required for competence. With the program we propose that this time would be reduced by two-thirds.
Another interesting finding was that the trainees' level of confidence increased with training. This was to be expected because an increase in diagnostic confidence is inherent to the process of learning,. However, we were surprised by the very high level of confidence achieved by both trainees after training. It is likely that this can be explained by factors related to the process of learning, such as the trainees' personal attitudes and motivation, training environment and perceived training quality.
Some limitations may be considered for this training program. First, training is mainly based on the offline assessment of static 3D volumes and clinical data with only a short period of real-time ultrasound training. This means that training in the dynamic aspects of real-time ultrasound, such as tenderness, mass mobility and upper abdomen assessment, is limited. Additionally, no specific selection was made for 3D volumes except in order to achieve a similar prevalence of malignancy in each set. Difficult cases were not specifically addressed. It would be of interest to know whether diagnostic performance and confidence is maintained at a high level in more difficult cases assessed after training. One additional limitation is that the study was performed by only one trainer and two trainees and that there was no control group. Therefore, the generalizability needs to be confirmed.
In conclusion, we consider the proposed training program for ultrasound assessment of adnexal masses to be feasible and render good results. It is our belief that a similar program could be implemented for training in ultrasound assessment for other gynecological conditions, such as endometrial or uterine pathology, or in reproductive medicine.