To assess the clinical usefulness of a structured reporting system based on ultrasound findings for management of adnexal masses.
To assess the clinical usefulness of a structured reporting system based on ultrasound findings for management of adnexal masses.
This was a prospective multicenter study comprising 432 adnexal masses in 372 women (mean age, 44.0 (range, 13–78) years) over a 36-month period. Ninety-three (25%) women were postmenopausal and 279 (75%) women were premenopausal. Patients were evaluated with transvaginal ultrasound by one of three examiners expert in gynecological ultrasound. Reporting was provided to referring clinicians according to the Gynecologic Imaging Report and Data System (GI-RADS) classification. A predetermined management protocol was offered to referral clinicians. It was suggested that patients classified as GI-RADS 2 be managed with follow-up scan, patients classified as GI-RADS 3 undergo laparoscopic surgery and patients classified as GI-RADS 4 or 5 be referred to a gynecologic oncologist. Definitive histologic diagnosis was available in 370 cases and 62 additional cases were considered as benign because of spontaneous resolution during follow-up. These outcomes were used as the gold standard for calculating the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+) and negative likelihood ratio (LR−) of GI-RADS classification for identifying adnexal masses at high risk of malignancy, considering GI-RADS 4 and 5 as being malignant.
Of the 432 tumors, 112 were malignant and 320 benign. The GI-RADS classification rate was as follows: GI-RADS 2, 92 (21%) cases; GI-RADS 3, 184 (43%) cases; GI-RADS 4, 40 (9%) cases; GI-RADS 5, (27%) 116 cases. Sensitivity for this system was 99.1% (95% CI, 95.1–99.8%), specificity was 85.9% (95% CI, 81.7–89.3%), LR+ was 7.05 (95% CI, 5.37–9.45) and LR− was 0.01 (95% CI, 0.001–0.07). PPV and NPV were 71.1% and 99.6%, respectively.
The GI-RADS reporting system performed well in identifying adnexal masses at high risk of malignancy and seems to be useful for clinical decision-making. Copyright © 2011 ISUOG. Published by John Wiley & Sons, Ltd.
Ultrasonography is currently considered as the primary imaging modality for identifying and characterizing adnexal masses1. Several approaches have been proposed for their characterization using this technique, including examiner's subjective impression2, simple descriptive scoring systems3, mathematically developed scoring systems4, logistic regression models5 and neural networks6.
Subjective impression of an experienced examiner is currently believed to be the best approach and no other method has been proven its superior7, 8. However, the examiner's impression is entirely subjective and recent evidence has shown that this fact affects not only the performance of the method itself9, but also the examiner's confidence in providing a diagnosis10. Furthermore, a recent randomized study demonstrated that examiner experience affects performance and decision-making in clinical practice11.
Due to the subjective nature of the examiner's impression there is a need for a standardized nomenclature and definition for all tumor features evaluated by ultrasound. This was provided by the International Ovarian Tumor Analysis (IOTA) consensus12. Undoubtedly, this consensus has allowed a better, homogeneous description of adnexal masses. However, there is still significant variation in the reporting of ultrasound examination results for adnexal masses13. In fact, a recent consensus conference of the Society of Radiologists in Ultrasound concluded that ‘investigation into structured reporting of adnexal cysts to allow for improved communication of results and recommendations for follow-up’ is needed14.
In 2009 we proposed a reporting system similar to that used for breast ultrasound (BI-RADS): the Gynecology Imaging Reporting and Data System (GI-RADS), developed to facilitate communication between sonologists/sonographers and referring clinicians15. This GI-RADS classification is based on ultrasound findings, representing a summarized standardized report of those findings and also providing an estimated risk of malignancy for a given adnexal mass.
The aim of this study was to assess prospectively the use of this reporting system for decision-making in clinical practice.
This was a prospective study comprising all women diagnosed as having an adnexal mass and evaluated at two different centers, one in Spain (Clinica Universidad de Navarra, Pamplona) and one in Chile (Centro Ecografico Ultrasonic Panoramico, Santiago), from January 2008 to December 2010. Institutional review board approval was obtained and all women gave informed consent to participate.
All patients were evaluated by transvaginal or transrectal (in cases of virgo-intacta women) ultrasound using a Voluson 730 Expert or Pro machine (GE Medical Systems, Zipf, Austria) according to a predetermined scanning protocol15. Three expert examiners (F.A., H.V. and J.L.A.), each with more than 15 years' experience in gynecological ultrasound, performed all examinations and between one and five representative images were stored on the machine's database, to be used in the report (Figure S1 online).
Reporting was performed according to GI-RADS classification15. This system is based on pattern recognition analysis and provides an a priori risk estimation of probability of malignancy, based on data from previous studies15–17. The reporting system includes five categories (Table 1) and the report includes a description of the mass as well as a final GI-RADS classification (Figure S1).
|GI-RADS grade||Diagnosis||Est. prob. malignancy||Detail|
|1||Definitive benign||0%||Normal ovaries identified and no adnexal mass seen|
|2||Very probably benign||< 1%||Adnexal lesions thought to be of functional origin, e.g. follicles, corpora lutea, hemorrhagic cysts|
|3||Probably benign||1–4%||Neoplastic adnexal lesions thought to be benign, such as endometrioma, teratoma, simple cyst, hydrosalpinx, paraovarian cyst, peritoneal pseudocyst, pedunculated myoma, or findings suggestive of pelvic inflammatory disease|
|4||Probably malignant||5–20%||Any adnexal lesion not included in GI-RADS 1–3 and with one or two findings suggestive of malignancy*|
|5||Very probably malignant||> 20%||Adnexal masses with three or more findings suggestive of malignancy*|
During the examination, tumor volume was also estimated according to the prolate ellipsoid formula (length × width × height × 0.5233, expressed in mL), but this feature was not taken into consideration for assigning a GI-RADS classification.
The meaning and goal of GI-RADS classification was explained to referring clinicians in several clinical sessions before the study started. A management protocol was offered to referral clinicians with the aim of determining whether this reporting system could be useful for deciding patient management and in avoiding confusion for clinicians. However, while we followed up patients to determine how they were managed ultimately, we were not involved in clinical decision-making.
The suggested management protocol was based on risk of malignancy as estimated by GI-RADS classification. Those patients classified as GI-RADS 1 (e.g. normal ovaries at ultrasound) were excluded from the study and from further analysis. GI-RADS 2 patients were considered for expectant management by follow-up sonography on the basis that these lesions were assumed to be functional. GI-RADS 3 patients underwent surgery by general gynecologists on the basis that these lesions were considered to be probably benign and expected to persist over time. Laparoscopy was preferable, although the surgeon managing the patient made the final decision regarding surgical approach (laparoscopy or laparotomy). Patients classified as GI-RADS 4 and 5 were referred to gynecological oncologists for appropriate additional imaging techniques (computed tomography or magnetic resonance imaging) and surgical management, on the basis that these lesions were considered to be probably or very probably malignant.
When surgical removal of the tumor was performed, a definitive histologic diagnosis was obtained. Tumors were classified according to World Health Organization criteria18 and malignant tumors were staged according to FIGO criteria19. Borderline tumors were considered as malignant for analytic purposes. STARD guidelines were followed for designing and conducting the study20.
Categorical variables were compared using the chi-square test and tumor volumes were compared using the Mann–Whitney U-test. We calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+) and negative likelihood ratio (LR−) of the GI-RADS system for identifying adnexal masses at high risk of malignancy, considering GI-RADS 2 and 3 as low risk and GI-RADS 4 and 5 as high risk. The gold standard was histologic diagnosis (benign or malignant) or spontaneous resolution of the cyst during follow-up (benign).
To determine how useful they found the GI-RADS reporting system for understanding ultrasound findings and for making decisions regarding patient management, referral clinicians involved in patient clinical decision-making were asked to complete a simple survey. This survey consisted of a single question: ‘How useful do you think GI-RADS reporting system is for understanding ultrasound findings and giving confidence in clinical decisions regarding your patient?’ and there were five possible answers: (A) totally useful; (B) quite useful; (C) neither useful nor useless; (D) useless; (E) completely useless.
To assess interobserver reproducibility of GI-RADS classification, two examiners (J.L.A. and A.I.) performed a separate analysis in 60 consecutive women who were already included in the study. Both examiners performed a transvaginal scan, blinded to each other's results, and each one provided a GI-RADS report. To determine the concordance between examiners we used a weighted Kappa index.
A total of 372 women with adnexal masses were included in this study (279 from the Clínica Universidad de Navarra and 93 from Centro Ecográfico Ultrasónic Panorámico). Their mean age was 44 (range, 13–78) years. Ninety-three (25%) women were postmenopausal and 279 (75%) were premenopausal. Sixty (16%) patients had bilateral tumors, giving a total number of 432 adnexal masses assessed. The prevalence of malignant tumors was 26% (112 malignant tumors in 87 patients). Malignant tumors were more frequent in postmenopausal women (43.2%) than in premenopausal women (13.2%) (P < 0.001).
Of the 432 masses assessed, 92 (21%) were classified as GI-RADS 2, 184 (43%) as GI-RADS 3, 40 (9%) as GI-RADS 4 and 116 (27%) as GI-RADS 5. Tumor volume was significantly smaller in GI-RADS 2 and 3 cases compared with GI-RADS 4 and 5 cases, while there was no difference in tumor volume between GI-RADS 2 and 3 cases or between GI-RADS 4 and 5 cases (Table 2). Most referring clinicians managed their patients according to GI-RADS classification. Figure 1 summarizes the classifications, management and final outcomes of the study population, and final histological diagnoses are given in Table 3.
|Tumor volume (mL)|
|Number of tumors classified as:|
|Histologic diagnosis||GI-RADS 2||GI-RADS 3||GI-RADS 4||GI-RADS 5||Total|
|Low malignant potential tumor||0||0||2||12||14|
|Primary ovarian cancer||0||1||5||75||81|
There was no malignant tumor classified as GI-RADS 2. There was one such case classified as GI-RADS 3; this false-negative case was a 73-year-old woman with a 580 mL cyst diagnosed as benign serous cyst, but histology showed it to be a serous ovarian carcinoma, Stage Ia.
The sensitivity for the GI-RADS reporting system in predicting malignancy was 99.1% (95% CI, 95.1–99.8%), specificity was 85.9% (95% CI, 81.7–89.3%), LR+ was 7.05 (95% CI, 5.37–9.45) and LR− was 0.01 (95% CI, 0.001–0.07) (Table 4). The PPV and NPV were 71.1% and 99.6%, respectively.
|Number of tumors classified as:|
|Final diagnosis||GI-RADS 2–3||GI-RADS 4–5||Total|
All fifteen (six in Spain and nine in Chile) referring clinicians considered this reporting system to be ‘quite useful’ or ‘useful’ for clinical decision-making in adnexal masses.
The interobserver agreement for GI-RADS classification of adnexal masses was very good (weighted kappa index = 0.846) (Table 5).
|Examiner B||GI-RADS 2||GI-RADS 3||GI-RADS 4||GI-RADS 5||Total|
Reporting in ultrasound evaluation of adnexal masses is an important issue. A recent study from Canada has shown that current reporting practices for ultrasound assessments in women with ovarian masses vary considerably and concluded that the use of a synoptic reporting system would be useful13. Inappropriate reporting may lead to unwarranted concern by the patient and referring clinician and could lead to unnecessary additional tests and surgery21. In fact, investigation into structured reporting of adnexal masses to allow for improved communication of results and recommendations for management has been advised recently14.
For this reason we recently developed a simple reporting system based on the concept developed for breast imaging (the BI-RADS classification), which was originally developed for mammographic findings but has been applied successfully to breast ultrasound. As for BI-RADS, the lexicon of our new system is intended to provide a unified language for ultrasound reporting and to avoid confusion in the communication between the sonographer/sonologist and the clinician. We called this reporting system GI-RADS15. In the present study we assessed prospectively the use of our GI-RADS reporting system for ultrasound evaluation of adnexal masses and clinical decision-making. A strength of the study is that the ultrasound examiners were not involved in the decision-making process.
The GI-RADS reporting system is based on the use of pattern recognition analysis of the tumor2 and the a-priori risk of malignancy of different tumor features15–17. Although one could argue that pattern recognition is a subjective assessment, there is evidence that this is the best method for characterizing adnexal masses7, 8 and that pattern recognition is reproducible among expert examiners22–24.
In terms of diagnostic performance, this reporting system performed well, with a very high sensitivity and acceptable specificity. This is not surprising bearing in mind that it is based on IOTA criteria, which have been tested extensively in several multicenter studies and shown to be good criteria for discriminating between benign and malignant adnexal masses25–27. However, one possible selection bias in our study is the relatively high prevalence of malignant tumors, which could affect estimation of sensitivity and specificity. Notwithstanding, both PPV and NPV were high and these figures are not affected by disease prevalence.
Our data have shown that the GI-RADS classification system is useful for clinical decision-making and referral. Furthermore, all referring clinicians involved in patient management considered it to be useful. We therefore propose a standardized nomenclature for reporting ultrasound findings of adnexal masses, applying the same rationale as that of BI-RADS classification for breast ultrasound. While it is true that adequate referral may be achieved using logistic models such as the risk malignancy index28, 29, scoring systems30 or just pattern recognition analysis as does IOTA31, a standardized reporting nomenclature is lacking. To the best of our knowledge, this is the first such standardized reporting/classification system applicable to adnexal masses.
It is likely that this reporting system would not be needed in those institutions where ultrasound examiners and clinicians participating in clinical decision-making have good and direct communication and decisions about patient management are collegiate, or even in those practices where expert sonologists themselves decide about their own patients' management. However, this system could be useful in those settings in which clinicians managing patients do not perform ultrasound examinations, instead reading the report of the morphological description of the tumor. It could also be useful for small hospitals and for private practitioner-gynecologists who must refer patients with suspicious masses to tertiary care hospitals with gynecologic oncology facilities.
There were some limitations to the study. A possible bias is that expert examiners performed all ultrasound examinations; this is known to potentially affect diagnostic performance when using pattern recognition analysis9, 32. Therefore, further research into how this reporting system performs when used by non-expert examiners is needed. Another bias of this study is that a management protocol according to GI-RADS classification was offered to referral clinicians before starting the study. This could have biased their decision as to how to manage the patients. An interesting issue regarding the suggested management protocol is the use of surgery in cases of GI-RADS 3. In fact, expectant management could also be offered safely to these patients33. A further weakness of this study is the fact that most GI-RADS 4 lesions were benign, although they were classified as being probably malignant. However, there was still a 20% risk of malignancy (8/40). One option for improving the predictive value of this group would be further classification into subgroups depending on degree of likelihood of malignancy according to the examiner's impression.
In conclusion, this prospective study has shown that GI-RADS classification performs well as a reporting system in adnexal masses and it seems to be useful for clinical decision-making.
SUPPORTING INFORMATION ON THE INTERNET
The following supporting information may be found in the online version of this article:
Figure S1 Sample report for the Gynecologic Imaging Report and Data System (GI-RADS) classification.