To determine which extrauterine pelvic masses are difficult to correctly classify as benign or malignant on the basis of ultrasound findings, and to determine if the use of logistic regression models for calculation of individual risk of malignancy would improve the diagnostic accuracy in difficult tumors.
In a prospective, international, European multicenter study involving nine centers, 1066 women with a pelvic mass judged to be of extrauterine origin underwent transvaginal ultrasound examination by an experienced ultrasound examiner before surgery. A standardized examination technique and predefined definitions of ultrasound characteristics were used. On the basis of subjective evaluation of ultrasound findings, the examiner classified each mass as being certainly benign, probably benign, unclassifiable, probably malignant or certainly malignant. Even when the examiner found the mass unclassifiable (i.e. difficult mass) he or she was obliged to state whether the mass was more likely to be benign or malignant. Borderline tumors were classified as malignant.
There were 90 (8%) unclassifiable masses. Multiple logistic regression analysis showed papillary projections, >10 locules in a cyst without solid components, low-level echogenicity of cyst fluid, and moderate vascularization as assessed subjectively at color Doppler examination to be ultrasound variables independently associated with unclassifiable mass. Borderline malignant tumors (n = 55) proved to be most difficult to assess with only 47% being correctly classified (i.e. classified as malignant), 29% being incorrectly classified (i.e. classified as benign) and 24% being unclassifiable vs. 90% of non-borderline tumors being correctly classified, 3% being incorrectly classified and 8% being unclassifiable (P < 0.0001). Papillary cystadeno(fibro)mas, myomas and cases of struma ovarii were also more common among the unclassifiable masses than among the classifiable ones (5.6% vs. 1.1%, P = 0.008; 4.4% vs. 0.9%, P = 0.02; 4.4% vs. 0.2%, P = 0.0006). No ultrasound variable or clinical variable (including CA 125) entered a logistic regression model to predict malignancy in difficult masses. A model could be constructed for difficult masses containing papillary projections but this model performed no better than subjective evaluation of the ultrasound image. Sensitivity and specificity of subjective evaluation with regard to malignancy in the group of unclassifiable masses were 56% (14/25) and 77% (50/65) vs. 91% (220/241) and 97% (712/735) in the classifiable masses.
Using subjective evaluation of gray-scale and Doppler ultrasound findings, an experienced ultrasound examiner using a good ultrasound system can correctly classify extrauterine pelvic masses as benign or malignant in most cases1, 2, and suggest a correct specific diagnosis (e.g. endometrioma, dermoid cyst or hydrosalpinx) in many cases3. The reported sensitivity and specificity with regard to malignancy in the studies cited were 88% and 96%1, and 96% and 90%2, respectively, and the reported sensitivity and specificity with regard to endometrioma were 92% and 97%, with regard to dermoid cyst 90% and 98%, and with regard to hydrosalpinx 100% and 100%3. However, in a small proportion of cases even a very experienced ultrasound examiner will find it difficult to discriminate between benign and malignant masses1.
The aim of this study was to determine which extrauterine pelvic masses are difficult to correctly classify as benign or malignant when using subjective evaluation of ultrasound findings as the diagnostic method, and to determine if the use of a logistic regression model for calculation of individual risk of malignancy would improve diagnostic accuracy in ‘difficult’ masses.
This was a prospective, international, multicenter study (International Ovarian Tumor Analysis study, IOTA) including the following nine centers: Malmö University Hospital, Lund University, Malmö, Sweden; University Hospitals, Leuven, Belgium; Universita del Sacro Cuore, Rome, Italy; ISBM L. Sacco University of Milan, Milan, Italy; Hôpital Boucicaut, Paris, France; Centre Medical des Pyramides, Maurepas, France; King's College Hospital, London, UK; ISBM Ospedale, San Gerado Universita di Milano, Monza, Italy and Universita degli Studi di Napoli, Naples, Italy. Recruitment was from June 1999 to June 2002.
Patients presenting with at least one overt pelvic mass judged to be of extrauterine origin and who were examined by a principal investigator at one of the participating centers were eligible for inclusion. In case of bilateral masses, data from the mass with the most complex ultrasound morphology were used. Our exclusion criteria were: pregnant patient, inability to tolerate transvaginal sonography, surgery >120 days after sonographic assessment, or incomplete submission of data. An expert external pathologist reviewed 10% of all pathological specimens. Cases were excluded if there was disagreement between the original histopathological diagnosis and that of the external reviewer.
A history, including the number of first-degree relatives with ovarian cancer or breast cancer, and use of hormone replacement therapy, was taken from each patient following a standardized protocol. A woman was considered to be postmenopausal if she reported a period of at least 12 months of amenorrhea after the age of 40 years, provided that medication or disease did not explain the amenorrhea. Women 50 years or older who had undergone hysterectomy were also defined as postmenopausal.
A transvaginal gray-scale and color Doppler ultrasound examination using a high-end ultrasound system equipped with a transvaginal transducer with a frequency of 4–8 MHz was performed in all cases. Transabdominal sonography was added if a large mass could not be seen in its entirety using a transvaginal probe. A standardized examination technique and standardized definitions of ultrasound terms were used in all centers. These have been published elsewhere4. A papillary projection was defined as any solid protrusion into a cyst cavity from the cyst wall with a height greater than or equal to 3 mm4. When intratumoral blood flow velocity waveforms were not detected, the peak systolic velocity, time-averaged maximum velocity, pulsatility index and resistance index were coded as 2.0 cm/s, 1 cm/s, 3.0 and 1.0, respectively, for use in mathematical modeling. The presence or absence of pain during the examination was noted. Finally, on the basis of subjective evaluation of the ultrasound findings, the ultrasound examiner classified each mass as being certainly benign, probably benign, unclassifiable, probably malignant or certainly malignant. Even when the examiner found the mass unclassifiable (i.e. difficult mass) he or she was obliged to state whether the mass was more likely to be benign or malignant.
Blood samples were drawn for analysis of CA 125, but the availability of this biochemical endpoint was not an essential requirement for recruitment into the study. The immunoradiometric assay CA 125 II (Centocor, Malvern, PA, USA or Cis-Bio, Gif-sur-Yvette cedex, France) or Abbott Axsym system, REF 3B41-22 (Abbott Laboratories Diagnostic Division, Abbott Park, IL, USA) was used. CA 125 results were unavailable to the ultrasound examiner at the time of the ultrasound examination.
Data were submitted via the Internet to a central database using a dedicated, secure data collection system developed for the study5.
Statistical analysis was carried out using the SAS System release 8.02 (SAS Institute Inc., Cary, NC, USA). Student's t-test and Mann–Whitney's test were used to test the statistical significance of differences in continuous data; the Chi-square test and Fisher's exact test were used to test the statistical significance of differences in discrete data. Logistic regression with stepwise selection of variables was used to determine which ultrasound variables were independently associated with unclassifiable mass and for building a model to predict malignancy in difficult masses. Two-tailed P-values <0.05 were considered statistically significant.
A total of 1149 patients were recruited. Data from 83 (7%) patients were excluded (eight because of pregnancy, 31 because surgery was undertaken more than 120 days from the sonographic assessment, 42 because of incomplete submission of data and two because of disagreement between pathologists over the histological diagnosis). Data from 1066 (93%) patients were available for statistical analysis and model development.
Clinical information including CA 125 values for the women included in the study are shown in Table 1. Women with unclassifiable masses (n = 90, i.e. 8% of all masses; 95% CI 7–10) were older and of higher parity than those with classifiable masses (n = 976). Histological diagnoses are presented in Table 2. Endometriomas and primary invasive malignancies Stage II–IV were less common among the unclassifiable masses than among the classifiable ones. Borderline tumors, papillary cystadenomas and (cyst)adenofibromas, myomas and struma ovarii were over-represented among the unclassifiable masses, these diagnoses being three to 20 times more common among the unclassifiable masses than among the others. Of the borderline tumors, 47% (26/55) were correctly classified (i.e. classified as malignant) by the ultrasound examiner, 29% (16/55) were incorrectly classified (i.e. classified as benign) and 24% (13/55) were unclassifiable. Of the non-borderline tumors, 90% (906/1011) were correctly classified, 3% (28/1011) were incorrectly classified and 8% (77/1011) were unclassifiable. This difference between borderline and non-borderline tumors was statistically significant (P < 0.0001).
Table 1. Clinical information for the women included in the study
The group ‘other’ contains 36 cases with more than one histological diagnosis in the same adnexa (e.g. mucinous cystadenoma and endometrioma in the same ovary, or chronic salpingitis and endometriosis in the same adnexa) and 13 other cases (e.g. tuberculous granuloma in the tube, mucinous histiocytoma, etc.).
Ultrasound findings are shown in Table 3. Multiple logistic regression analysis showed papillary projections, >10 locules in a cyst without solid components, low-level echogenicity of cyst fluid, and color score 3 (indicating moderate vascularization4) to be the only ultrasound variables independently associated with unclassifiable mass with odds ratio estimates of 5.1 (P < 0.0001), 3.7 (P = 0.0175), 2.5 (P = 0.0002) and 1.8 (P = 0.0109). In the present study, 64% (35/55) of the borderline tumors contained papillary projections vs. 24% (242/1011) of the non-borderline tumors (P = 0.0001), 11% (6/55) vs. 2% (19/1011) were multilocular cysts with >10 locules (P = 0.012) and 33% (18/55) vs. 19% (191/1011) contained cyst fluid with low-level echogenicity (P = 0.023).
Table 3. Ultrasound findings in the women included in the study
P-values refer to comparison between unclassifiable and classifiable masses.
Percentages and P-values calculated in the subgroup of masses with papillary projections.
Percentages and P-values calculated in the subgroup of masses with solid components.
Pulsatility index, resistance index and peak systolic velocity were measured in 752 patients, in 78 with an unclassifiable mass and in 674 with a classifiable mass; time-averaged maximum velocity was measured in 739 patients, in 78 with an unclassifiable mass and in 661 with a classifiable mass.
Time-averaged maximum velocity (cm/s) (median (range))§
Examples of unclassifiable masses are shown in Figure 1. In unclassifiable masses the sensitivity and specificity of subjective evaluation of ultrasound findings with regard to malignancy were 56% (14/25) and 77% (50/65), the positive and negative likelihood ratios being 2.43 and 0.57. For the classifiable masses the corresponding figures were 91% (220/241), 97% (712/735), 28.5 and 0.09 (P < 0.0001 for the difference in sensitivity and P < 0.0001 for the difference in specificity).
Substituting missing values for information about papillary projections and solid components (papillary flow, papillary volume, volume of solid component, etc.) in tumors without papillary projections and solid components with zeros, no ultrasound variable or clinical variable entered a logistic regression model to predict malignancy in difficult masses. However, a model could be constructed for masses with papillary projections (39 masses in the training set and 15 in the test set) (Table 4). CA 125 did not add any significant information to the model. The interpretation of the model is that for each one-unit increase in height of the largest papillary projection the odds of malignancy increased 1.23 times, for each one-unit increase in the number of papillary projections the odds increased 2.67 times and for each one-unit increase in thickness of the thickest septum the odds decreased 0.54 times. Areas under receiver–operating characteristics curves, sensitivity and specificity with regard to malignancy, and positive and negative likelihood ratios of the logistic regression model and of subjective evaluation of the ultrasound image in the training set and test set of difficult tumors with papillary projections are shown in Table 5.
Table 4. Logistic regression model to predict malignancy in unclassifiable masses with papillary projections
Maximum likelihood estimates
Odds ratio estimates
95% Wald confidence limits
PapHeight, height of largest papillary projection; PapNumber, number of papillary projections; septum, thickness of thickest septum.
Papillary height in mm
Thickness of septum in mm
Table 5. Areas under receiver–operating characteristics (ROC) curves, sensitivity and specificity with regard to malignancy, and positive and negative likelihood ratios of a logistic regression model and of subjective evaluation of the ultrasound image in difficult masses with papillary projections
Training set (n = 39)
Test set (n = 15)
Subjective evaluation of ultrasound image
Positive likelihood ratio
Negative likelihood ratio
Logistic regression model
Area under ROC curve
Risk cut-off 10%
Positive likelihood ratio
Negative likelihood ratio
Risk cut-off 50%
Positive likelihood ratio
Negative likelihood ratio
This multicenter study has shown that experienced ultrasound examiners using high-end ultrasound systems find slightly less than 10% of pelvic masses judged to be of adnexal origin difficult to classify as benign or malignant on the basis of gray-scale and color Doppler ultrasound findings. Masses with papillary projections, multilocular cysts with >10 locules, cysts with low-level echogenicity of cyst fluid, and masses moderately vascularized at color Doppler ultrasound examination seem to be more difficult to classify than other types of tumor. The histological diagnoses that present the greatest diagnostic difficulties are borderline tumors, struma ovarii, papillary (cyst)adeno(fibro)mas and myomas. No logistic regression model was found that was useful to distinguish between benignity and malignancy in difficult masses. However, a logistic regression model built on unclassifiable masses with papillary projections suggested that the more papillary projections and the larger the papillary projections the greater the risk of malignancy including borderline malignancy.
It is a strength of the present study that it is large and involves many centers. Therefore, the results are likely to be generalizable to other experienced ultrasound examiners using good ultrasound systems provided that they are exposed to a population similar to the present study population. We believe that the present sample of masses is fairly representative of the types of extrauterine pelvic mass currently considered appropriate to remove surgically.
It was an unexpected finding that myomas were common among difficult masses. This is likely to be explained by some of the myomas in this series not being ordinary myomas. They were all suspected to be an adnexal mass both clinically and at ultrasound examination, and their ultrasound morphology was unclear enough to justify surgical removal.
Masses with papillary projections, multilocular cysts with > 10 locules and masses with low-level echogenicity of cyst fluid were clearly over-represented among the difficult masses. This is not surprising because we found that these ultrasound features are characteristic of borderline tumors, and borderline tumors were over-represented among the difficult tumors. Others, too, found papillary projections and multilocularity to be characteristic of borderline tumors6, 7.
It is important to be able to reliably discriminate between benign and malignant adnexal masses in order to be able to correctly evaluate the need for surgery and to choose the appropriate time and mode of operation. The present study has demonstrated that an experienced ultrasound examiner using a good ultrasound system can be expected to be able to correctly discriminate between benign and malignant adnexal masses in >90% of all cases. However, in approximately 10% of cases even an experienced ultrasound examiner using a good ultrasound system is likely to fail to make a confident and correct diagnosis. Before a method capable of distinguishing between benignity and malignancy in such difficult pelvic masses is found (the new method might involve ultrasound contrast or proteomics), we must accept that some women will need to undergo an unnecessary operation—or perhaps an unnecessarily extensive operation—because of our inability to reliably exclude malignancy before surgery.
This study was supported by the Swedish Medical Research Council (Grants K2001-72X-11605-06A and K2002-72X-11605-07B), two governmental grants (Landstingsfinansierad regional forskning, Region Skåne and ALF-medel) and funds administered by Malmö University Hospital and by the Research Council of the Katholieke Universiteit Leuven, Belgium (GOAMEFISTO666, GOA-AMBioRICS), the Flemish Government (FWO: Projects G.0407.02/G.0269.02/G.0360.05, Research Communities ICCoS and ANMMM), the Belgian Federal Government (DWTC: IUAP IV-02 (1996-2001) and IUAP V-22 (2002-2006)) and the EU (PDT-COIL: Contract NNE5/2001/887; BIOPATTERN: Contract FP6-2002-IST 508803; eTUMOUR: Contract FP6-2002-LIFESCIHEALTH 503094).
The IOTA Steering Committee comprised: Dirk Timmerman, Lil Valentin, Thomas H. Bourne, William P. Collins, Sabine Van Huffel and Ignace Vergote. The IOTA principal investigators (listed in alphabetical order) comprised: Jean-Pierre Bernard, Maurepas, France; Thomas H. Bourne, London, UK; Enrico Ferrazzi, Milan, Italy; Davor Jurkovic, London, UK; Fabrice Lécuru, Paris, France; Andrea Lissoni, Monza, Italy; Ulrike Metzger, Paris, France; Dario Paladini, Naples, Italy; Antonia Testa, Rome, Italy; Dirk Timmerman, Leuven, Belgium; Lil Valentin, Malmo, Sweden; Sabine Van Huffel, Leuven, Belgium; Caroline Van Holsbeke, Leuven, Belgium; Ignace Vergote, Leuven, Belgium and Gerardo Zanetta, Monza, Italy.