- Top of page
Objective To test the accuracy of the risk of malignancy index, the revised risk of malignancy index and Tailor's regression model to diagnose malignancy in women with known adnexal masses.
Design Prospective collaborative study.
Setting Gynaecology Assessment Unit, Department of Obstetrics and Gynaecology, King's College Hospital, London.
Sample Sixty-one women with known adnexal masses were examined pre-operatively. Women were recruited from three South London hospitals.
Methods The demographic, biochemical and sonographic data recorded for each patient included: age; menopausal status; CA125 levels; tumour volume; ultrasound characteristics; and Doppler blood flow analysis (peak and mean blood velocities, the pulsatility and resistance indices). The diagnosis of malignancy was made for each woman using all three models and the results compared with the final histopathological diagnosis.
Results Thirty-eight women had benign tumours and 23 had ovarian cancer. Women with malignant tumours were significantly older than those with benign masses. There were also significant differences in CA125 levels, locularity, presence of papillary proliferations and ascites between the two groups. Tailor's regression model achieved a 43% sensitivity and 92% specificity in the diagnosis of malignancy. This compared with a 74% sensitivity and 92% specificity with the risk of malignancy model, and a 74% sensitivity and 89% specificity with the revised risk of malignancy model.
Conclusion When applied prospectively all three diagnostic models performed less accurately than originally reported, despite clinical signs of malignancy being present in many cases. It is likely that their accuracy would be even less in a population of women in whom there was a substantial clinical uncertainty. Intra-tumoral blood velocity and CA125 levels were the best individual parameters for discrimination between benign and malignant tumours.
- Top of page
Accurate pre-operative discrimination between benign and malignant adnexal masses would help to optimise surgical management of women with pelvic tumours. Thus, women with malignancies could be referred to tertiary oncology centres, while those with benign tumours could be offered more conservative surgical management1. In order to achieve this aim various methods have been investigated to assess adnexal masses. Over the past years these have included tumour markers, grey-scale and Doppler ultrasound with varying degrees of success.
Grey-scale morphological ultrasound features suggestive of malignancy include the presence of papillary proliferations, septae and solid areas within the cyst2,3. None of these features used in isolation enables an accurate diagnosis of malignancy, and therefore they have been combined into various morphological scoring systems4,5. However, these scoring systems failed to achieve the level of sensitivity and specificity to enable their implementation into routine clinical practice5. With the advent of transvaginal colour Doppler imaging vascular changes within the ovary can be studied, in addition to the morphology. The results with this technique are wide ranging6,7, with most authors showing little improvement compared with grey-scale morphology8–11.
Serum tumour markers have also been investigated for their potential role in distinguishing benign from malignant masses. By far the most widely used has been serum CA125, which is raised in up to 80% of epithelial ovarian cancers12–14. However, serum CA125 levels alone are relatively non-specific and have therefore always required interpretation in conjunction with clinical and ultrasound findings.
Until recently there were no diagnostic models available which combined demographic, sonographic and biochemical data in the assessment of women with adnexal masses. In the last few years, three such diagnostic models have been designed based on retrospective data analysis. These models are the risk of malignancy index (risk management model 1)15, the revised risk of malignancy index (revised risk management model 2)16 and Tailor's regression model17. In this study we prospectively tested the accuracy of all three models for the pre-operative assessment of women with known adnexal masses.
- Top of page
This prospective collaborative study was conducted between July 1997 and September 1998. Women were recruited consecutively from King's College Hospital, Greenwich District Hospital and University Hospital Lewisham. All scans were performed at King's College Hospital. Women with known adnexal masses, due to be admitted for surgery, were examined pre-operatively using transvaginal grey-scale and Doppler ultrasound. Age and menopausal status were recorded in all women. Postmenopausal status was defined as more than one year of amenorrhoea or age > 50 years in women who had had a hysterectomy. Transvaginal ultrasound examination was performed using a 5 MHz transducer with B-mode and Doppler facilities (Aloka SSD-2000, Aloka Co, Tokyo, Japan). All the scans were performed by one of two operators (N.A. and D.J.). The following grey-scale morphological information was recorded for each woman: site and volume of tumour; locularity; echogenicity; presence of papillary projections; intraabdominal metastases; and ascites. The volume of tumour was calculated from three diameters taken in perpendicular planes using the formula for a prolate ellipsoid (D1× D2× D3×π/6).
Following this, the entire tumour was surveyed by colour Doppler imaging. The ultrasound equipment was initially set at maximum sensitivity to detect blood flow. By gradually increasing the pulse repetition frequency low velocity signals were filtered out. A pulsed Doppler gate was placed over the areas within the tumour with the highest blood velocity. Adjustments were made to the angle of the probe until the audible signal gave the highest pitch. Flow velocity waveforms were obtained and the peak systolic velocity, time-averaged mean velocity, pulsatility index and resistance index were calculated electronically. A blood sample was then taken for the measurement of CA125 levels with the Immuno-1 analyser (Bayer, USA).
The models tested were the risk of malignancy index first described by Jacobs et al.15 in 1990 which was later revised in 1996 by Tingulstad et al.16, and Tailor's regression model described by Tailor et al.17 in 1997. The risk management model was derived by using step-wise logistic regression analysis. Using this technique menopausal status, ultrasound score and CA125 levels were found to be the most significant variables in predicting the likelihood of malignancy. The risk management model score was calculated using the formula: ultrasound score × menopausal status × serum CA125 (kU/L). Ultrasound score (U) was assigned to each adnexal mass depending on the grey-scale morphological appearances. The presence of each of the following were given one point: bilateral lesions, multilocular lesions, ascites, solid areas and intra-abdominal metastases. A total score of 0 was given a U value of 0; a score of 1 was given a U value of 1; and a score of ≥ 2 was given a U value of 3. Premenopausal women were given a score of one and postmenopausal women were given a score of three. The index value > 200 was taken as predictive of malignancy. A cut off value of 200 was proposed in the original paper to achieve relatively high levels of sensitivity and specificity. The revised risk of malignancy index (revised risk management model)16 was designed using the same formula. However, the ultrasound score was only given a value of 1 for a total score of 0–1 depending on the grey-scale morphological appearance; and a value of 4 for a total score of ≥ 2. Similarly, a score of one was given to premenopausal women and a score of four to postmenopausal women.
Finally, Tailor's regression model17 calculates the probability of malignancy of an adnexal mass using the formula: P= 1/(1 + e−z), where e is the base value for natural logarithms and z= (0.1273 × age) + (0.2794 × time-averaged mean velocity) + (4.4136 × papillary projection score) − 14.2046. Papillary projection score was given the values of one and zero depending on the presence or absence of papillary projections. A probability of > 50% was taken to be diagnostic of malignancy. This cut off was proposed to maintain a high specificity as well as sensitivity.
All women underwent staging laparotomy by the attending gynaecological surgeon and according to the system defined by the International Federation of Gynecology and Obstetrics18. All specimens were examined histologically by taking block sections and classified according to the World Health Organisation guidelines19. The definitive diagnosis was made at the time of histopathological examination. Both the histopathologist and the surgeon were blinded to the results of the models.
The ability of the three diagnostic models to detect malignancy was tested prospectively and compared with the final histopathological diagnosis. Borderline tumours were counted as malignant for this study. Statistical analysis was performed using SPSS for Windows (Version 6.0) (SPSS Inc, Chicago, Illinois, USA). The means of unpaired groups of data were compared using the Mann-Whitney U test or Student's t test. The proportions of benign and malignant cases with various morphological characteristics were compared using the Yates corrected χ2 test. Sensitivity, specificity and diagnostic accuracy were calculated for each of the models20. Receiver operating characteristic (ROC) curves were generated using GraphROC for Windows. When intra-tumoral blood flow could not be visualised, the time-averaged mean velocity was assigned the value 1.0 cm/s as in Tailor's original study, to enable calculation of the probability of malignancy using the regression model17.
- Top of page
Sixty-one women with adnexal masses were examined pre-operatively. Thirty-eight (62%) women had benign tumours, four (7%) had borderline tumours, and 19 (31%) had malignant tumours. The high proportion of women with advanced malignancy in this study is a reflection of the workload of the gynaecology oncology unit. Women with malignant tumours were significantly older and also more likely to be postmenopausal than those women with benign tumours (Table 1). Only one woman had bilateral tumours with different histopathological diagnosis, both of which were benign. None of the women had malignant tumours on one side and benign on the other.
Table 1. Demographic, grey-scale ultrasound, Doppler and biochemical findings in women with benign and malignant adnexal masses (n= 61). Values are given as mean (range), unless otherwise indicated. PI = pulsatility index; RI = resistance index; PSV = peak systolic velocity; TAMXV = time-averaged mean velocity.
|Variables||Benign mass (n= 38)||Malignant mass (n= 23)||P|
|Age (years)||43 (20–70)||53 (25–77)||0.005|
|Tumour volume (cm3)||211 (4–2137)||595 (13–4559)||0.0870|
|Papillary projections (%)||3||39||<0.0001|
|PI||1.04 (0.50–2.50)||0.82 (0.25–1.42)||0.2710|
|RI||0.61 (0.41–1.00)||0.54 (0.22–0.76)||0.3939|
|PSV (cm/sec)||23.78 (5–55)||30.65 (14–57)||0.1291|
|TAMXV (cm/sec)||15.80 (2–44)||20.50 (10–39)||0.0771|
|CA125 levels (kU/L)||68.7 (4–1067)||750 (6–6200)||<0.0001|
The majority of women with benign masses had dermoid cysts, functional cysts, cystadenomas or endometriomas (Table 2). Of the 23 women with malignant adnexal masses, 14 had invasive epithelial ovarian cancer, five had non-epithelial cancers, and four had borderline tumours. There were no cases of metastatic tumours. The majority of women with invasive epithelial ovarian cancer were classified as FIGO stage III at the time of operation (Table 3).
Table 2. Histological classification of benign tumours (n= 38). Values are given as n (%).
Table 3. Staging and classification of malignant tumours (n= 23). Values are given as n (%).
| ||FIGO stage|| || |
|Epithelial||1 (7)||2 (14)||9 (65)||2 (14)||14|
|Nonepithelial||—||1 (20)||4 (80)||—||5|
When comparing gray-scale morphological characteristics of benign and malignant adnexal tumours the latter group of tumours tended to be larger, with a significantly greater proportion of multilocular cysts, papillary projections and ascites as compared with benign tumours (Table 1). Doppler blood flow analysis revealed a higher velocity (peak systolic velocity and time-averaged mean velocity) and lower impedance (pulsatility index and resistance index) blood flow in malignant as compared with benign tumours.
All 61 cases were analysed using the three diagnostic models. When considering all tumour types, Tailor's regression model using a cut off value of 50% for the diagnosis of malignancy achieved a 43% sensitivity and 92% specificity. In comparison, both risk management model and revised risk management model achieved 74% sensitivity. The specificities were 92% and 89%, respectively (Table 4 and Fig. 1).
Table 4. Comparisons of the performance of diagnostic tests in the original reports and in this prospective study. Values are given as n (%) [95% CI], unless otherwise indicated. RMI 1 = risk of malignancy index; RMI 2 = revised risk of malignancy index.
|Test||Cut off value||Sensitivity||Specificity||Diagnostic accuracy|
|Original results|| || || || |
| Tailor7apos;s model||50%||13/15 (87) [69–100]||51/52 (98) [94–100]||64/67 (96) [91–100]|
| RMI 1||200||36/42 (85) [75–96]||98/101 (97) [94–100]||134/143 (94) [90–98]|
| RMI 2||200||45/56 (80) [70–91]||108/117 (92) [87–97]||153/173 (88) [84–93]|
|Prospective study|| || || || |
|Tailor's model||50%||10/23 (43) [23–64]||35/38 (92) [84–100]||45/61 (74) [64–83]|
|RMI 1||200||17/23 (74) [56–92]||35/38 (92) [84–100]||52/61 (85) [76–94]|
|RMI 2||200||17/23 (74) [56–92]||34/38 (89) [80–99]||51/61 (84) [74–92]|
The ability of the models to detect the different histological types of ovarian malignancies was also tested. All three models performed best in diagnosing invasive epithelial cancer. The sensitivity of Tailor's regression model was 71% and the sensitivity of both the risk management model and the revised risk management models was 93%. The specificities remained the same as previously quoted. Tailor's regression model failed to detect any borderline and non-epithelial tumours. The risk management model and the revised risk management model achieved 25% sensitivity in the detection of borderline tumours and 60% sensitivity in non-epithelial cancers.
- Top of page
All three models designed to diagnose ovarian cancer achieved lower sensitivities and specificities in this prospective study as compared with the original reports. This was despite the high risk nature of the population; some of whom had clinically apparent malignancies. These results would suggest that the diagnostic performance of the models may be even less accurate if applied to a population of women in whom substantial clinical uncertainty exists. The performance of the models prospectively is not an unexpected result as all the models were designed using retrospective data analysis, thereby providing the best fit models for the examined datasets. The original logistic regression model was primarily designed to allow a multi-parameter approach to discriminating between benign and malignant ovarian tumours. It also provided a useful output which was directly related to the probability of malignancy. However, Tailor's regression model was found to be the least reliable model when tested prospectively. This was despite it being designed in our own unit where one would have expected consistency with the examination technique. The difference in performance may have been therefore due to lack of sample homogeneity in the two studies.
The original dataset on which Tailor's regression model was designed included 15 malignant cases, 12 (80%) of which were invasive epithelial ovarian cancers and three (20%) borderline tumours. In contrast our new dataset included only 14 (61%) invasive epithelial cancers, five (22%) non-epithelial and four (17%) borderline tumours. This may partly account for the poor performance of Tailor's regression model in cases of non-epithelial cancers, as this tumour type was absent in the original dataset. However, the failure to detect borderline tumours is more difficult to explain. Two cancers in this group were missed because they occurred in very young women with no increase in blood flow to the mass, although papillary projections were present. The remaining two cases occurred in postmenopausal women where the masses had neither morphological nor blood flow evidence of malignancy. The failure to detect four out of 14 (29%) invasive epithelial cancers is of particular concern. In these cases the false negative findings were due to the absence of papillary projections or no increase in blood flow. Scrutiny of Tailor's regression model reveals that the presence of papillary projections is given a high weight. This may be a major reason why the model has performed poorly. Perhaps future modelling should be based on a larger sample size which incorporates all the different histological types of ovarian tumours.
The risk management models depend largely on the CA125 levels to diagnose malignancy, and it is not surprising that both the risk management model and revised risk management model achieved identical sensitivities when tested prospectively. Similar to Tailor's regression model, the majority of women in the original study populations had invasive epithelial ovarian cancers. Both models failed to diagnose six (26%) of the 23 malignant tumours in our dataset. The six tumours comprised of one epithelial cancer, two non-epithelial cancers and three cases of borderline tumours. In all of these cases the CA125 levels were < 70 kU/L and in five out of six (83%) cases the levels were < 20 kU/L. Both risk management model and revised risk management model correctly identified nine out of the ten malignant tumours diagnosed by Tailor's regression model. The case missed was that of an invasive epithelial ovarian cancer that occurred in a 45 year old woman whose CA125 level was 17 kU/L. Histology revealed a Stage I c epithelial ovarian cancer. On grey-scale morphology a unilateral unilocular cyst was seen containing solid areas.
The false positive rates were 23% with Tailor's regression model, 15% with risk management model and 20% with revised risk management model. False positive findings with Tailor's regression model occurred in cases of an endometrioma, a thecoma and a tubo-ovarian abscess. In all these cases the time-averaged mean velocity was high (i.e. 55,25 and 38 cm/sec, respectively). There were three false positive diagnosis of ovarian cancer with the risk management model, whereas there were four false positive diagnoses with revised risk management model. Two of these were cases of endometriotic cysts in young women with high CA125 levels. The third case was a postmenopausal woman with a benign cystadenoma and raised CA125 levels. The revised risk management model in addition misdiagnosed a benign thecoma in a postmenopausal woman with CA125 levels of 14 kU/L. The cyst was multilocular containing solid areas. Hence, high CA125 levels were responsible for all the false positive diagnosis occurring with risk management model and three out of four cases with revised risk management model. Alternative cut off values for the diagnostic models were not explored as the purpose of this study was not to improve the performance of the models but to test the models prospectively.
All three diagnostic models performed significantly better compared with individual demographic, ultrasound or biochemical parameters (Table 5). When individual parameters were analysed, the CA125 levels were the most accurate in predicting malignancy. In fact, using a CA125 cut off value of > 35 kU/L a 78% sensitivity and 79% specificity are achieved. However, when the cut off value is increased to 70 kU/L the sensitivity decreases to 61% and the specificity increases to 92%. Incorporating CA125 levels into Tailor's regression model in the future may improve its performance. The CA125 levels were not considered in the original report. When considering the Doppler variables, if an resistance index value of < 0.4 was taken to be diagnostic of malignancy, as suggested in previous reports, a 9% sensitivity and 100% specificity would have been achieved21.
Table 5. Receiver operating characteristics of the diagnostic models as compared with single parameters. Values are given as n [95% CI] or % (95% CI), unless otherwise indicated. RMI 1 = risk of malignancy index; RMI 2 = revised risk of malignancy index; PSV = peak systolic velocity; TAMXV = time-averaged mean velocity; PI = pulsatility index; RI = resistance index.
|Model||Sensitivity||Specificity||Area under curve|
|Tailor's regression model||43 (23–64)||92 (84–100)||0.8604 [0.77–0.95]|
|RMI 1||74 (56–92)||92 (84–100)||0.8976 [0.82–0.97]|
|RMI 2||74 (56–92)||89 (80–99)||0.8770 [0.79–0.96]|
|CA125||65 (43–84)||87 (72–96)||0.8398 [0.75–0.92]|
|PSV||74 (51–90)||79 (63–91)||0.7895 [0.69–0.89]|
|TAMXV||78 (56–93)||79 (63–91)||0.7986 [0.70–0.90]|
|PI||61 (38–80)||76 (60–89)||0.7752 [0.67–0.88]|
|RI||65 (43–84)||74 (57–87)||0.7683 [0.66–0.87]|
|Age||61 (38–80)||82 (66–92)||0.7225 [0.61–0.83]|
|Volume||57 (34–77)||68 (51–83)||0.6322 [0.51–0.75]|
This study has shown that various histological types of ovarian cancer exhibit different demographic, ultrasound and biochemical characteristics. Women with invasive epithelial ovarian tumours tend to be older and more often menopausal than those with non-epithelial or borderline tumours. On ultrasound examination, epithelial and borderline tumours often display characteristic papillary projections, whereas non-epithelial tumours tend to be more solid masses. Epithelial tumours were more often bilateral as compared with non-epithelial and borderline tumours (P < 0.05). The Doppler blood flow characteristics of the tumours also varies. Epithelial tumours tend to have greater blood flow characterised by higher velocities (P < 0.05). Borderline tumours were associated with lower blood flow. CA125 levels are also significantly higher in epithelial tumours (mean CA125 = 1201 kU/L), compared with non-epithelial and borderline cases (mean CA125 = 53 and 44 kU/L, respectively). All three models were designed primarily based on epithelial ovarian tumours, and therefore, not surprisingly, in our study these models were better at detecting cases of epithelial tumours as opposed to non-epithelial and borderline tumours. In addition, since some of the women included in this study had advanced ovarian cancer, it is likely that the models performed better than they would if applied to a low risk general population.
Given the differences in tumour types it may be very difficult to design a single diagnostic model which would reliably detect all tumours. It may be possible to improve further the detection of invasive epithelial cancer by combining ultrasound Doppler findings with tumour morphology and CA125 levels. Different models would be required for the detection of borderline and non-epithelial tumours which would complicate their implementation into routine clinical practice. Alternatively, an artificial neural network model may prove to be of diagnostic value. Preliminary data have shown that the neural network provides better diagnostic accuracy than that of Tailor's regression model22,24. However, larger numbers are required to construct a neural network model and it is unlikely that such a model will be available in the near future.