Clinically oriented three-step strategy for assessment of adnexal pathology

Authors


L. Ameye Katholieke, Academic Medical Center, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven 3001, Belgiu E-mail: lieveke.ameye@esat.kuleuven.be

Abstract

Objective

To determine the diagnostic performance of ultrasound-based simple rules, risk of malignancy index (RMI), two logistic regression models (LR1 and LR2) and real-time subjective assessment by experienced ultrasound examiners following the exclusion of masses likely to be judged as easy and ‘instant’ to diagnose by an ultrasound examiner, and to develop a new strategy for the assessment of adnexal pathology based on this.

Methods

3511 patients with at least one persistent adnexal mass preoperatively underwent transvaginal ultrasonography to assess tumor morphology and vascularity. They were included in two consecutive prospective studies by the International Ovarian Tumor Analysis (IOTA) group: Phase 1 (1999–2005), development of the simple rules and logistic regression models LR1 and LR2, and Phase 2, a validation study (2005–2007).

Results

Almost half of the cases (43%) were identified as ‘instant’ to diagnose on the basis of descriptors applied to the database. To assess diagnostic performance in the more difficult ‘non-instant’ masses, we used only Phase 2 data (n = 1036). The sensitivity of LR2 was 88%, of RMI it was 41% and of subjective assessment it was 87%. The specificity of LR2 was 67%, of RMI it was 90% and of subjective assessment it was 86%. The simple rules yielded a conclusive result in almost 2/3 of the masses, where they resulted in sensitivity and specificity similar to those of real-time subjective assessment by experienced ultrasound examiners: sensitivity 89 vs 89% (P = 0.76), specificity 91 vs 91% (P = 0.65). When a three-step strategy was applied with easy ‘instant’ diagnoses as Step 1, simple rules where conclusive as Step 2 and subjective assessment by an experienced ultrasound examiner in the remaining masses as Step 3, we obtained a sensitivity of 92% and specificity of 92% compared with sensitivity 90% (P = 0.03) and specificity 93% (P = 0.44) when using real-time subjective assessment by experts in all tumors.

Conclusion

A diagnostic strategy using simple descriptors and ultrasound rules when applied to the variables contained in the IOTA database obtains results that are at least as good as those obtained by subjective assessment of a mass by an expert. Copyright © 2012 ISUOG. Published by John Wiley & Sons, Ltd.

Introduction

Accurately characterizing ovarian pathology prior to deciding on surgical intervention is important for the optimization of patient management[1-3]. Previous studies have shown that both the sonographic morphological and vascular characteristics of a mass can give very useful information about its nature[4-6]. Simple ultrasound rules, developed by the International Ovarian Tumor Analysis (IOTA) group, can be used as a triage test to predict benignity/malignancy in an adnexal mass. They have been shown to yield a conclusive result (benign or malignant) in about 75% of adnexal masses, and when conclusive or if used as a triage test, they perform as well as subjective assessment by an experienced ultrasound examiner for discriminating between benign and malignant masses before surgery[7]. However, the risk of malignancy index (RMI) is currently recommended in both the UK and USA8, 9, while the IOTA group also recently validated two logistic regression models with very good diagnostic performance (IOTA Phase 2)[10].

To date, attempts to develop rules or models to characterize pathology have included all masses irrespective of whether, in real-life clinical practice, ultrasound examiners would find them difficult to diagnose or not. As a result, the diagnostic performance of these rules/models might be overestimated and so perform worse than expected when applied outside the research setting. Some types of adnexal pathology exhibit characteristic ultrasound features, for example certain endometriomas, dermoid cysts, simple cysts and advanced cancers[11], [12]. In these circumstances an ultrasound examiner will not apply rules/models to make a diagnosis.

The aim of this study was to mimic the clinical reality where easy ‘instant’ diagnoses (i.e. masses likely to be judged as easy to diagnose by an ultrasound examiner) do not require diagnostic models, and to test in the remaining patients the diagnostic performance of the simple rules, RMI, the two logistic regression models and the real-time subjective assessment of ultrasound findings by the examiner.

Patients and Methods

Patients were eligible for inclusion in the study if they had at least one adnexal mass that was examined by ultrasonography by a principal investigator at one of the participating institutions. Exclusion criteria were pregnancy, refusal to undergo transvaginal sonography and surgery not taking place within 120 days of ultrasonography. When two masses were present, only information from the most complex one was retained. All ultrasound variables were collected prospectively. In total, 3511 patients from 21 centers in nine countries were included: 1066 patients in Phase 1A (1999–2002, development study of two logistic regression models LR1 and LR2), 507 patients in Phase 1B (2002–2005, temporal validation study of LR1 and LR2) and 1938 patients in Phase 2 (2005–2007, temporal and external validation study of LR1 and LR2)[10], [13], [14]. The study was approved by the ethics committee of each of the participating institutions.

Transvaginal sonography was carried out according to a predefined protocol by ultrasound examiners specialized in gynecological ultrasound, i.e. the principal investigator at each center. If a large mass could not be seen in its entirety using a transvaginal probe, transabdominal sonography was used. All centers used high-quality ultrasound equipment with sensitive color Doppler functions. Over 40 morphological and blood-flow variables were collected to characterize each adnexal mass. Details of the ultrasound examination technique and the ultrasound terms and definitions used have been described elsewhere[15]. Finally, the ultrasound examiner who carried out the scan stated whether the mass was likely to be malignant or benign on the basis of subjective assessment and how certain he or she was about the diagnosis (certainly benign, probably benign, uncertain but most probably benign, uncertain but most probably malignant, probably malignant or certainly malignant); this is reported as the ‘real-time subjective assessment’. The ultrasound information was recorded prospectively in the electronic IOTA database and was locked at the time of the examination so that information could not be changed afterwards.

The participating centers were encouraged to measure the level of serum CA 125 (U/mL) in peripheral blood from all patients, but a missing CA 125 result was not an exclusion criterion.

The outcome was the histological diagnosis and, in cases of malignancy, the surgical stage. Surgery was performed by laparoscopy or laparotomy according to the surgeon's judgment. The excised tissues underwent histological examination at the local center. Tumors were classified according to the criteria recommended by the International Federation of Gynecology and Obstetrics[16].

Easy ‘instant’ diagnosis

A member of the steering committee defined criteria to make an easy ‘instant’ diagnosis in terms of six descriptors using variables collected routinely and prospectively in each phase of the IOTA study. The descriptors were based on ultrasound information and measurements of serum CA 125 levels: four described the features of commonly found benign tumors, while two described probable malignancies (Table 1)[12]. The presence or absence of the descriptors for each mass was ascertained by analysis of the database. If none of the six descriptors was applicable or both a descriptor of a benign and a malignant mass were present, we considered the diagnosis as more difficult, and so defined it as ‘non-instant’.

Table 1. Easy ‘instant’ diagnoses of adnexal masses identified by six descriptors (Phases 1 and 2, total n = 3511)
DescriptorPredicted histologyCorrect outcome (benign or malignant)Correct histologyCases in which descriptor was applicable
  1. Data given as n (% (95% CI)) or n.

  2. a

    No overlap between descriptors (1066 = 398 + 136 + 243 + 289).

  3. b

    Overlap between descriptors for a malignant mass: 159 masses were predicted as malignant by both predictors.

  4. c

    In three cases, both a descriptor of a benign and a malignant mass were applicable: unilocular tumors with regular walls in women older than 50 years and with CA 125 > 100 U/mL; for these masses no easy diagnosis is possible (1518 = (1066 − 3) + (458 − 3)).

Predicted outcome benign   1066a
Unilocular tumor with groundEndometrioma396/398360/398 
 glass echogenicity in a premenopausal woman (99.5 (98.2–99.9))(90.5 (87.2–93.0)) 
Unilocular tumor with mixedTeratoma136/136126/136 
 echogenicity and acoustic shadows in a premenopausal woman (100 (97.3–100))(92.6 (87.0–96.0)) 
Unilocular anechoic tumor withSimple cyst/240/243210/243 
 regular walls and maximum diameter of lesion < 10 cm cystadenoma(98.8 (96.4–99.6))(86.4 (81.5–90.2)) 
Remaining unilocular tumor with 285/289  
 regular walls (98.6 (96.5–99.5))  
Predicted outcome malignant   458b
Tumor with ascites and at least 194/203  
 moderate color Doppler blood flow in a postmenopausal woman (95.6 (91.8–97.7))  
Age > 50 years and CA 125 > 100 U/mL 386/414  
  (93.2 (90.4–95.3))  
Total   1518c

Simple rules

We applied the simple rules as reported previously[7]. Briefly, we used five ultrasound-based variables collected in the study to predict a malignant tumor (M-features): irregular solid tumor (M1), ascites (M2), at least four papillary structures (M3), irregular multilocular solid tumor with a largest diameter of at least 100 mm (M4) and very high color content on color Doppler examination (M5). We also used five ultrasound features to predict a benign tumor (B-features): unilocular cyst (B1), presence of solid components for which the largest solid component was < 7 mm in largest diameter (B2), acoustic shadows (B3), smooth multilocular tumor (B4) and no detectable blood flow on Doppler examination (B5). If one or more M-features were present in the absence of a B-feature, we classified the mass as malignant (Rule 1). If one or more B-features were present in the absence of an M-feature, we classified the mass as benign (Rule 2). If both M-features and B-features were present, or if none of the features was present, the simple rules were inconclusive (Rule 3). Once again the presence or absence of M or B features for each mass was determined by the statistician by analyzing the prospectively collected ultrasound data contained within the IOTA database.

Risk of malignancy index

The RMI was determined using the ultrasound findings, menopausal status and serum CA 125 level. Five ultrasound features suggestive of cancer were incorporated in an ultrasound score (U): multilocularity, solid areas, bilateral masses, ascites and evidence of metastases. U was assigned a value of 0 when none of these features was present, 1 if one feature was present and 3 if two or more features were present. A score (M) of 1 was assigned to premenopausal women and 3 to postmenopausal women. RMI was defined as U × M × (serum CA 125 (U/mL)). An RMI of ≥ 200 was an indication for cancer[17].

Logistic regression models LR1 and LR2

The LR1 model was based on the age of the patient, the presence of ascites, the presence of blood flow within a solid papillary projection as determined by Doppler ultrasound, maximal diameter of the solid component (mm, up to a maximum of 50 mm), irregular internal cyst walls, the presence of acoustic shadows, personal history of ovarian cancer, current hormonal therapy, maximum diameter of the lesion (mm), the presence of pain during the examination, the presence of a purely solid tumor and the color content of the tumor at color Doppler examination (none, minimal, moderate, strong). The simpler LR2 model used only the first six variables. An estimated probability of malignancy > 0.10 by LR1 or LR2 was an indication for cancer[13].

In order to evaluate the performance of RMI, LR1 and LR2 the variables included in the models were all included in the prospectively collected and locked IOTA database, and then analyzed by the biostatistician for the study (L.A.).

Statistical analysis

The sensitivity and specificity (with 95% CIs) of the real-time subjective assessment of the mass, as well as the RMI, LR1 and LR2, were calculated using Wilson's method[18]. CIs for positive and negative likelihood ratios were calculated with the method described by Simel et al.[19] and CIs for the diagnostic odds ratio were calculated as described by Armitage and Berry[20]. McNemar's test was used to test the statistical significance of differences in sensitivity and specificity between subjective assessment, CA 125, RMI, LR1 and LR2. Receiver–operating characteristics (ROC) curves were constructed and the 95% CIs for the area under the curve (AUC) were obtained using the method proposed by DeLong et al.[21]. The ROC curve for subjective assessment was constructed using the six levels of diagnostic confidence: certainly benign, probably benign, uncertain but most probably benign, uncertain but most probably malignant, probably malignant and certainly malignant. All P-values were two-sided and P < 0.05 was considered statistically significant. SAS version 9.2 (SAS Institute, Cary, NC, USA) was used for all analyses.

Results

In total, 3511 patients were included in the study. Almost half of the cases were identified as easy ‘instant’ to diagnose. For 43.2% (95% CI, 42–45%; n = 1518) of the patients, at least one of the six descriptors could be used to predict histology (Table 1). The sensitivity of the six descriptors in the 1518 easy ‘instant’ masses was 98% (95% CI, 96–99%) and the specificity was 97% (95% CI, 96–98%).

Table 2 shows the histology of the masses. The malignancy rate was 27.1% (951/3511) with 186 (19.6%) of the 951 malignancies being borderline tumors, 645 (67.8%) primary invasive tumors, and 120 (12.6%) metastatic tumors. Of the 334 primary invasive Stage III cancers, 221 (66.2%) were considered as easy ‘instant’ diagnoses; 204 (61.1%) of them were predicted to be malignant by the descriptor ‘patient's age > 50 years and CA 125 > 100 U/mL’. In patients with an easy ‘instant’ diagnosis, the mean age was 47 ± 17 years and 42 (2.8%) had a family history of ovarian cancer. In the remaining 1993 patients with more difficult ‘non-instant’ diagnosis, the mean age was 47 ± 15 years and 48 (2.4%) had a family history of ovarian cancer.

Table 2. Histology of the adnexal masses (n = 3511)
HistologyAll (n = 3511)Easy ‘instant’ diagnosis (n = 1518)More difficult ‘non-instant’ diagnosis (n = 1993)
  1. Data given as n (%).

  2. a

    Including struma ovarii, Brenner tumor, Sertoli cell tumor, stromal cell tumor, Schwannoma, lymphangioma, benign transitional cell tumor, cystic mesothelioma, mesothelial cyst of peritoneum and steroid cell tumor of stromal luteoma variant.

  3. b

    Including granulosa cell tumor, Leydig cell tumor, dysgerminoma, gynandroblastoma, leiomyosarcoma, immature teratoma, mixed Müllerian tumor, small cell tumor, Brenner cancer, carcinosarcoma, choriocarcinoma, yolk sac tumor, endodermal sinus tumor, gastrointestinal stromal tumor, lymphoma, neuroendocrine carcinoma, rhabdomyosarcoma, Sertoli cell tumor, struma ovarii and stromal tumor.

Benign (total)2560 (72.9)1084 (71.4)1476 (74.1)
 Endometrioma713 (20.3)449 (29.6)264 (13.2)
 Teratoma402 (11.4)192 (12.6)210 (10.5)
 Simple cyst + parasalpingeal cyst281 (8.0)137 (9.0)144 (7.2)
 Functional cyst116 (3.3)52 (3.4)64 (3.2)
 Hydrosalpinx + salpingitis100 (2.8)31 (2.0)69 (3.5)
 Peritoneal pseudocyst21 (0.6)4 (0.3)17 (0.9)
 Abscess42 (1.2)6 (0.4)36 (1.8)
 Fibroma152 (4.3)20 (1.3)132 (6.6)
 Serous cystadenoma420 (12.0)122 (8.0)298 (15.0)
 Mucinous cystadenoma270 (7.7)66 (4.3)204 (10.2)
 Rare benigna43 (1.2)5 (0.3)38 (1.9)
Borderline (total)186 (5.3)35 (2.3)151 (7.6)
 Borderline stage I164 (4.7)28 (1.8)136 (6.8)
 Borderline stage II8 (0.2)4 (0.3)4 (0.2)
 Borderline stage III13 (0.4)3 (0.2)10 (0.5)
 Borderline stage IV1 (0.03)0 (0.0)1 (0.1)
Primary invasive (total)645 (18.4)350 (23.1)295 (14.8)
 Primary invasive stage I136 (3.9)46 (3.0)90 (4.5)
 Primary invasive stage II47 (1.3)22 (1.4)25 (1.3)
 Primary invasive stage III334 (9.5)221 (14.6)113 (5.7)
 Primary invasive stage IV58 (1.7)41 (2.7)17 (0.9)
 Rare primary invasiveb70 (2.0)20 (1.3)50 (2.5)
Metastatic (total)120 (3.4)49 (3.2)71 (3.6)

In the 1993 more difficult ‘non-instant’ masses (Figure 1), real-time subjective assessment by an experienced ultrasound examiner had a sensitivity of 85% (95% CI, 82–88%) and specificity of 90% (95% CI, 88–91%). To compare subjective assessment, RMI, LR1 and LR2 we then restricted our analysis to the Phase 2 data (as LR1 and LR2 were constructed on the Phase 1 data) where CA 125 measurements were available, giving 803 tumors for analysis, including 254 cancers (Table 3). Subjective assessment missed 34 cancers and gave 78 false-positive results. RMI missed 149 cancers and gave 57 false-positive results. LR1 and LR2 had sensitivity rates similar to those of subjective assessment, but produced more false-positive results. LR1 missed 25 cancers and yielded 177 false-positive results, while LR2 missed 31 cancers and yielded 181 false-positive results. The sensitivity of the six-variable LR2 model was twice as high as that of RMI: 88% (95% CI, 83–91%) vs 41% (95% CI, 35–47%); P < 0.001. However the specificity of LR2 was lower than that of RMI: 67% (95% CI, 63–71%) vs 90% (95% CI, 87–92%); P < 0.001.

Figure 1.

Flow chart of patients included in the study.

Table 3. Diagnostic performance in the more difficult ‘non-instant’ diagnoses of simple rules, risk of malignancy index (RMI), logistic regression model 1 (LR1), logistic regression model 2 (LR2) and real-time subjective assessment by experienced ultrasound examiners (Phase 2, n = 1036)
 Assessment methodSensitivity (% (95% CI))Specificity (% (95% CI))
Simple rules conclusive   
 CA 125 available (n = 506; 30% malignant)   
 Simple rules89 (83–93)90 (86–92)
 RMI51 (43–58)92 (88–94)
 LR193 (88–96)81 (77–85)
 LR295 (91–98)78 (73–82)
 Subjective assessment88 (81–92)90 (87–93)
 All (n = 661; 26% malignant)   
 Simple rules89 (84–93)91 (88–93)
 LR193 (88–96)84 (81–87)
 LR295 (91–97)80 (76–83)
 Subjective assessment89 (83–92)91 (89–94)
Simple rules inconclusive   
 CA 125 available (n = 297; 34% malignant)   
 RMI27 (20–37)86 (81–90)
 LR186 (78–92)43 (36–50)
 LR276 (67–84)48 (41–55)
 Subjective assessment85 (77–91)77 (71–83)
 All (n = 375; 33% malignant)   
 LR186 (78–91)45 (39–51)
 LR275 (67–82)49 (43–55)
 Subjective assessment86 (79–91)80 (74–84)
All   
 CA 125 available (n = 803; 32% malignant)   
 RMI41 (35–47)90 (87–92)
 LR190 (86–93)68 (64–72)
 LR288 (83–91)67 (63–71)
 Subjective assessment87 (82–90)86 (83–88)
 All (n = 1036; 29% malignant)   
 LR190 (86–93)71 (68–74)
 LR287 (82–90)69 (66–72)
 Subjective assessment88 (83–91)87 (85–90)

The diagnostic accuracy in terms of AUC of subjective assessment was 0.91 (95% CI, 0.89–0.93), for LR1 it was 0.88 (95% CI, 0.86–0.91), for LR2 it was 0.85 (95% CI, 0.82–0.88), for RMI it was 0.75 (95% CI, 0.72–0.79) and for CA 125 alone it was 0.71 (95% CI, 0.67–0.74).

In the more difficult ‘non-instant’ masses where the six descriptors did not apply (Figure 1 and Table 3), the simple rules yielded a conclusive result in almost 2/3 (661/1036) of the masses, where they resulted in sensitivity and specificity similar to those of real-time subjective assessment by experienced ultrasound examiners: sensitivity 89 vs 89% (P = 0.76), specificity 91 vs 91% (P = 0.65).

In the 297 difficult ‘non-instant’ masses where the simple rules yielded an inconclusive result and where results of CA 125 measurements were available, subjective assessment had a sensitivity that was three times higher than that of RMI (85 vs 27%; P < 0.001), at a cost of a lower specificity (77 vs 86%; P = 0.02). Logistic regression model LR1 obtained a similar sensitivity as subjective assessment (86 vs 85%; P = 0.81), but specificity was much lower (43 vs 77%; P < 0.001). The AUCs of RMI, LR1, LR2 and subjective assessment in the 297 masses in which the simple rules were inconclusive were 0.64 (95% CI, 0.57–0.70), 0.75 (95% CI, 0.69–0.81), 0.68 (95% CI, 0.61–0.74) and 0.85 (95% CI, 0.80–0.90).

We set up a three-step strategy as follows: Step 1, easy ‘instant’ diagnoses; Step 2, in the remaining more difficult ‘non-instant’ masses, simple rules; and Step 3, if the simple rules were inconclusive, subjective assessment by experienced ultrasound examiners (Figure 2). Of the 1938 masses in Phase 2, 47% were judged as easy and therefore ‘instant’ to diagnose by an ultrasound examiner, in 34% simple ultrasound-based rules could be applied and the remaining 19% needed referral to an experienced ultrasound examiner. This strategy resulted in the identification of low-, moderate- and high-risk groups. 29% of the masses were considered as ‘high risk’. The prevalence of malignant tumors in this high-risk group was 84.6% (483/571).

Figure 2.

Flow chart showing the diagnosis of ovarian cancer in three steps (Phase 2 data, n = 1938): Step 1, easy ‘instant’ diagnoses (ovarian masses likely to be judged as ‘easy’ to diagnose by an ultrasound examiner); Step 2, simple ultrasound-based rules applied in the remaining ‘difficult’ masses; Step 3, if simple rules are inconclusive, subjective assessment by an experienced ultrasound examiner.

When the three-step strategy was applied, we obtained a sensitivity of 92.2% (500/542) (43.5% (236/542) by easy ‘instant’ diagnosis, 28.8% (156/542) by simple rules and 19.9% (108/542) by subjective assessment) and specificity of 92.3% (1288/1396) (46.4% (648/1396) by easy ‘instant’ diagnoses, 31.6% (441/1396) by simple rules and 14.3% (199/1396) by subjective assessment). The sensitivity obtained was higher than that of real-time subjective assessment by an expert in all tumors (sensitivity 90.4% (490/542); P = 0.03) and specificity was similar to that obtained using subjective assessment in all tumors (specificity 92.7% (1294/1396); P = 0.44) (Table 4).

Table 4. Sensitivity and specificity of the simple rules as triage test for adnexal masses (excluding and including the easy ‘instant’ diagnoses), and subjective assessment, logistic regression model 1 (LR1), logistic regression model 2 (LR2) and risk of malignancy index (RMI), using IOTA Phase 2 data (n = 1938)
Assessment methodSensitivity (% (95% CI))Specificity (% (95% CI))LR+ (95% CI)LR− (95% CI)DOR (95% CI)
  1. a

    First-stage test, easy diagnoses (as defined by the six descriptors); second-stage test, simple rules; third-stage test, cases inconclusive by the simple rules are predicted as malignant.

  2. b

    First-stage test, simple rules; second-stage test, cases inconclusive by the simple rules are predicted as malignant. DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR−, negative likelihood ratio.

Cases with CA 125 available (n = 1504)     
 Easy diagnoses + simple rules + subjective assessment92 (90–94)91 (89–93)10.2 (8.3–12.4)0.09 (0.06–0.12)119.5 (80.5–177.2)
 Easy diagnoses + simple rules + LR192 (90–95)84 (82–86)5.9 (5.1–6.8)0.09 (0.07–0.12)66.0 (45.3–96.0)
 Easy diagnoses + simple rules + LR290 (88–93)85 (83–87)6.1 (5.3–7.1)0.11 (0.09–0.15)54.4 (38.5–77.0)
 Easy diagnoses + simple rules + RMI80 (77–84)93 (91–94)10.8 (8.7–13.5)0.21 (0.18–0.26)50.9 (36.8–70.3)
 *Easy diagnoses + simple rules + inconclusive = ‘malignant’95 (93–97)76 (73–79)4.0 (3.6–4.4)0.06 (0.04–0.09)64.5 (41.4–100.5)
 Simple rules + subjective assessment91 (88–93)92 (90–93)10.7 (8.7–13.1)0.10 (0.07–0.13)109.6 (74.9–160.4)
 Simple rules + LR191 (88–93)85 (82–87)6.0 (5.1–6.9)0.10 (0.08–0.14)57.7 (40.4–82.5)
 Simple rules + LR289 (86–91)85 (83–88)6.1 (5.3–7.1)0.13 (0.10–0.17)47.7 (34.2–66.5)
 Simple rules + RMI79 (75–83)93 (91–94)11.0 (8.8–13.8)0.22 (0.19–0.27)49.2 (35.6–67.9)
 †Simple rules + inconclusive = ‘malignant’95 (92–96)76 (73–78)3.9 (3.5–4.3)0.07 (0.05–0.11)53.3 (35.3–80.7)
 Subjective assessment90 (87–92)92 (90–93)10.6 (8.6–13.0)0.11 (0.08–0.14)97.3 (67.3–140.7)
 LR193 (90–95)81 (79–84)5.0 (4.4–5.7)0.09 (0.07–0.12)55.2 (37.9–80.2)
 LR291 (88–93)81 (79–83)4.8 (4.2–5.5)0.11 (0.08–0.14)44.6 (31.4–63.3)
 RMI68 (63–72)93 (91–94)9.1 (7.3–11.5)0.35 (0.31–0.40)26.2 (19.4–35.4)
Total (n = 1938)     
 Easy diagnoses + simple rules + subjective assessment92 (90–94)92 (91–94)11.9 (9.9–14.3)0.08 (0.06–0.11)142.0 (98.0–205.8)
 Easy diagnoses + simple rules + LR192 (89–94)86 (84–88)6.6 (5.8–7.6)0.09 (0.07–0.12)71.9 (50.8–101.7)
 Easy diagnoses + simple rules + LR290 (87–92)87 (85–88)6.8 (5.9–7.8)0.12 (0.09–0.15)56.8 (41.4–78.0)
 *Easy diagnoses + simple rules + inconclusive = ‘malignant’95 (93–97)78 (76–80)4.3 (3.9–4.8)0.06 (0.04–0.09)73.4 (48.2–111.7)
 Simple rules + subjective assessment91 (88–93)93 (91–94)12.5 (10.3–15.1)0.10 (0.07–0.13)130.6 (91.2–186.8)
 Simple rules + LR191 (88–93)86 (84–88)6.7 (5.8–7.6)0.11 (0.08–0.14)63.9 (45.9–88.9)
 Simple rules + LR288 (85–91)87 (85–89)6.8 (5.9–7.9)0.13 (0.11–0.17)50.7 (37.4–68.8)
 †Simple rules + inconclusive = ‘malignant’95 (92–96)78 (75–80)4.2 (3.8–4.7)0.07 (0.05–0.10)61.2 (41.2–90.8)
 Subjective assessment90 (88–93)93 (91–94)12.4 (10.2–14.9)0.10 (0.08–0.13)119.5 (84.3–169.6)
 LR192 (90–94)84 (82–86)5.7 (5.0–6.4)0.09 (0.07–0.12)62.9 (44.4–89.2)
 LR290 (88–93)83 (81–85)5.3 (4.7–5.9)0.12 (0.09–0.15)45.6 (33.2–62.7)

Discussion

We have shown in a large study that about 50% of adnexal masses are ‘easy’ to diagnose (instant) using information that should be obtainable by most clinicians (Step 1). Furthermore, a large number of residual tumors can be characterized using simple alternative ultrasound-based rules to predict the benign or malignant nature of a mass. The remaining masses can be evaluated by the subjective impression of an expert. We have demonstrated that the previously validated logistic regression models LR1 and LR2 can also classify these residual more difficult ‘non-instant’ to classify masses with a good sensitivity for cancer but with relatively low specificity, while measurements of serum CA 125 levels and the RMI have very low detection rates.

When the simple ultrasound-based rules are applied as a second step, they yield the same sensitivity and specificity as real-time subjective assessment by an experienced ultrasound examiner (Step 2). In adnexal masses in which the rules yield an inconclusive result, subjective evaluation of ultrasound findings by an experienced ultrasound examiner (Step 3) is the most accurate diagnostic test, while RMI, LR1 and LR2 are not useful. If we apply the three-step strategy to all masses, we obtain a higher sensitivity than when subjective assessment by an expert is used.

A strength of our study is that it is based on over 3500 adnexal masses, including 765 invasive malignancies and 186 borderline tumors. Furthermore, the data were collected prospectively in 21 centers in nine countries and so any finding is expected to have general applicability. Both a strength and a limitation of our study is that the IOTA database only includes surgically removed tumors with a known histological outcome. As a result, we do not know how subjective assessment or models would perform if applied to tumors not deemed to require surgery. However, it is probable that most of these tumors would fall into the easy-to-diagnose ‘instant’ category, because one would expect them to consist mainly of small simple cysts, dermoid cysts and endometriomas. More complex tumors would probably be selected for intervention. Therefore, our results are likely to be applicable to a total population of adnexal masses. A weakness of the analysis is that while the data used to develop the descriptors were collected prospectively the performance of the descriptors for defining masses that are easy to diagnose has not been evaluated on a different population, nor has this approach been explicitly tested in a separate prospective observational study. This limitation is common to reports testing new diagnostic approaches.

Real-time subjective assessment by an experienced ultrasound examiner has been shown to be the most accurate method for discriminating between benign and malignant adnexal pathology[22]. It is also an accurate method of identifying the specific pathology of a tumor[23]. In practice, all ultrasound examiners go through a mental process when carrying out a scan. For some masses they will immediately identify a specific appearance and recognize the nature of a lesion, without the need to measure specific structures in the tumor. Moreover, in women older than 50 years, an adnexal mass associated with a serum CA 125 level of over 100 U/mL is very often a carcinoma (Table 1). The residual group where no descriptors can be applied requires more detailed evaluation: if we apply the simple rules as a triage test in these difficult masses, they perform as well as subjective assessment by an experienced ultrasound examiner. This combined approach may change clinical practice by providing an accurate instant classification of most adnexal masses while reducing the number of patients that need to be referred for expert scanning[7].

To date, attempts to develop rules or models to characterize adnexal tumors have almost always been based on the entire spectrum of adnexal pathology. We believe that this has led to a false impression of the performance characteristics of some tests such as the RMI when utilized in clinical practice[9], [24], [25]. Our results suggest that the current approach used to characterize adnexal masses may not be optimal. We have shown how readily obtained descriptors that define an ‘easy’ diagnosis and simple rules can be used to select patients who are extremely likely to have either a benign adnexal lesion or a malignancy. The ultrasound-based descriptors and simple rules should be recognizable by any trained ultrasound examiner, and appropriate management can be planned on the basis of these according to the protocols used in individual units. We believe that this approach reflects clinical reality whereby sonographers will easily classify many masses, but need any secondary test to be optimized to perform well in the residual, more difficult masses.

For women who present with masses for which the simple rules are inconclusive, the best option is to refer them to an expert in gynecological ultrasonography for assessment.

It is probable that the descriptors will be very useful for teaching purposes, particularly for clinicians with limited experience of scanning ovarian tumors. More experienced sonologists apply these descriptors subconsciously and instantly recognize certain types of adnexal tumor. In this paper we have shown that a diagnostic strategy using simple descriptors and rules can characterize the majority of ovarian tumors with a high degree of accuracy. The use of mathematical models may be reserved for tumors where these simpler more intuitive rules cannot be applied. Ultrasound examiners have the option to use a strategy based on simple rules and descriptors, followed by either the application of mathematical models or referral for the subjective opinion of an expert if one is available.

Acknowledgments

We thank all the participating centers for recruiting patients. We also thank Astraia (Munich, Germany) for the electronic data collection. Research supported by Research Council KUL: GOAAMBioRICS, CoE EF/05/006 Optimization in Engineering (OPTEC); FWO: G.0407.02 (support vector machines), G.0302.07 (SVM), G.0341.07 (Data fusion), research communities (ICCoS, ANMMM); IWT-TBM 070706 (IOTA); Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO); EU: BIOPATTERN (FP6-2002-IST 508803); the Swedish Medical Research Council (grants nos. K2001-72X 11605-06A, K2002-72X-11605-07B, K2004-73X-11605-09A and K2006-73X-11605-11-3); funds administered by Malmö University Hospital; and two Swedish governmental grants (ALF-medel and Landstingsfinansierad Regional Forskning). T.B. is supported by the Imperial Healthcare NHS Trust NIHR Biomedical Research Centre.

Appendix

Recruitment centers

University Hospitals Leuven (Belgium), Malmö University Hospital, Lund University (Sweden), Ospedale S. Gerardo, Università Cattolica del Sacro Cuore Rome (Italy), Università di Milano Bicocca, Monza (Italy), Ziekenhuis Oost-Limburg (ZOL), Genk, (Belgium), Medical University in Lublin (Poland), University of Cagliari, Ospedale San Giovanni di Dio, Cagliari (Italy), IEO, Milano (Italy), University of Bologna (Italy), King's College Hospital London (UK), Universita degli Studi di Napoli, Napoli (Italy), DCS Sacco University of Milan (Italy), General Faculty Hospital of Charles University, Prague (Czech Republic), Hôpital Boucicaut Paris (France), Chinese PLA General Hospital, Beijing (P.R. of China), Centre Medical des Pyramides (France), Lund University Hospital, Lund (Sweden), Macedonio Melloni Hospital, University of Milan (Italy), Università degli Studi di Udine (Italy), McMaster University, St. Joseph's Hospital, Hamilton, Ontario (Canada), Instituto Nationale dei Tumori, Fondazione Pascale, Napoli (Italy)

IOTA Steering Committee

Dirk Timmerman, Leuven, Belgium Tom Bourne, London, UK Lil Valentin, Malmö, Sweden Antonia C. Testa, Rome, Italy Sabine Van Huffel, Leuven, Belgium Ignace Vergote, Leuven, Belgium

IOTA principal investigators (alphabetical order)

Jean-Pierre Bernard, Maurepas, France Artur Czekierdowski, Lublin, Poland Elisabeth Epstein, Lund, Sweden Enrico Ferrazzi, Milan, Italy Daniela Fischerová, Prague, Czech Republic Dorella Franchi, Milano, Italy Robert Fruscio, Monza, Italy Stefano Greggi, Napoli, Italy Stefano Guerriero, Cagliari, Italy Davor Jurkovic, London, UK Fabrice Lécuru, Paris, France Francesco P.G. Leone, Milano, Italy Andrea A. Lissoni, Monza, Italy Ulrike Metzger, Paris, France Henry Muggah, Hamilton, Ontario, Canada Dario Paladini, Napoli, Italy Alberto Rossi, Udine, Italy Luca Savelli, Bologna, Italy Antonia Carla Testa, Roma, Italy Dirk Timmerman, Leuven, Belgium Diego Trio, Milano, Italy Lil Valentin, Malmö, Sweden Caroline Van Holsbeke, Genk, Belgium Gerardo Zanetta [deceased], Monza, Italy Jing Zhang, Beijing, P.R. of China

Ancillary