To determine if tumor vascularity as assessed by three-dimensional (3D) power Doppler ultrasound can be used to discriminate between benign and malignant ovarian tumors, if adding 3D power Doppler ultrasound to gray-scale imaging improves differentiation between benignity and malignancy, and if 3D power Doppler ultrasound adds more to gray-scale ultrasound than does two-dimensional (2D) power Doppler ultrasound.
One hundred and six women scheduled for surgery because of an ovarian mass were examined with transvaginal gray-scale ultrasound and 2D and 3D power Doppler ultrasound. The color content of the tumor scan was rated subjectively by the ultrasound examiner on a visual analog scale. Vascularization index (VI), flow index (FI) and vascularization flow index (VFI) were calculated in the whole tumor and in a 5-cm3 sample taken from the most vascularized area of the tumor. Logistic regression analysis was used to build models to predict malignancy.
There were 79 benign tumors, six borderline tumors and 21 invasive malignancies. A logistic regression model including only gray-scale ultrasound variables (the size of the largest solid component, wall irregularity, and lesion size) was built to predict malignancy. It had an area under the receiver–operating characteristics (ROC) curve of 0.98, sensitivity of 100%, false positive rate of 10%, and positive likelihood ratio (LR) of 10 when using the mathematically best cut-off value for risk of malignancy (0.12). The diagnostic performance of the 3D flow index with the best diagnostic performance, i.e. VI in a 5-cm3 sample, was superior to that of the color content of the tumor scan (area under ROC curve 0.92 vs. 0.80, sensitivity 93% vs. 78%, false positive rate 16% vs. 27% using the mathematically best cut-off value). Adding the color content of the tumor scan or FI in a 5-cm3 sample to the logistic regression model including the three gray-scale variables described above improved diagnostic performance only marginally, an additional two tumors being correctly classified.
Subjective evaluation of the gray-scale ultrasound morphology of adnexal tumors is a reliable tool for discriminating between benign and malignant tumors1, and it is reproducible2. Assessment of tumor vascularization using conventional color-, power- or spectral- Doppler ultrasonography does not seem to add much to gray-scale imaging3, 4, despite neoangiogenesis being known to play an important role in the growth, metastasis and resistance to treatment of tumors5–8. The introduction of three-dimensional (3D) power Doppler ultrasound has opened up the possibility of objectively assessing vascularization in a whole organ or tumor.
The aim of this study was to determine if tumor vascularity, as assessed by 3D power Doppler ultrasound, can be used to discriminate between benign and malignant ovarian tumors; if adding 3D power Doppler ultrasound examination to gray-scale ultrasound imaging improves differentiation between benign and malignant ovarian tumors; and if 3D power Doppler adds more to gray-scale ultrasound imaging than does two-dimensional (2D) power Doppler.
Patients and Methods
A consecutive series of 131 women scheduled for surgery because of a pelvic mass clinically judged to be of adnexal origin were examined with transvaginal ultrasonography by one of the authors (L. V.). Exclusion criteria were: an unequivocal ultrasound diagnosis of tubal disorder (e.g. hydrosalpinx), peritoneal pseudocyst, paraovarian cyst or dermoid cyst, Doppler artifacts, or surgery > 90 days after the ultrasound examination.
Before the ultrasound examination a history, including the number of first-degree relatives with ovarian cancer or breast cancer and use of hormone replacement therapy or contraceptive pills, was taken from each patient following a standardized research protocol. A woman was considered to be postmenopausal if she reported a period of at least 12 months of amenorrhea after the age of 40 years, provided that medication or disease did not explain the amenorrhea. Women 50 years or older who had undergone hysterectomy were also defined as postmenopausal.
The ultrasound examinations were carried out with the women in the lithotomy position with an empty bladder, using a Voluson 730 Expert ultrasound system (GE Healthcare, Zipf, Austria) with a 2.8–10-MHz transvaginal transducer. Identical fixed pre-installed power Doppler ultrasound settings were used: frequency 6–9 (‘normal’) MHz, pulse repetition frequency 0.6 kHz, gain − 4.0, wall motion filter ‘low 1’ (40 Hz). A standardized examination technique and standardized definitions of gray-scale ultrasound terms were used as previously described9. A papillary projection was defined as any solid protrusion into a cyst cavity from the cyst wall with a height of ≥ 3 mm9. The size of the whole adnexal lesion and the size of the largest solid component of the lesion were calculated as the mean of three orthogonal diameters measured using calipers on the frozen 2D ultrasound image. After completion of gray-scale ultrasound examination, the ultrasound system was switched into the power Doppler mode. On the basis of the 2D power Doppler ultrasound findings the ultrasound examiner rated the color content of the whole tumor scan on a visual analog scale (VAS) ranging from 0 to 100 arbitrary units and also classified the tumor as having no detectable vascularization (color score 1), minimal vascularization (color score 2), moderate vascularization (color score 3), or abundant vascularization (color score 4). Then the system was switched to the 3D power Doppler mode. Attempts were made to include the whole tumor or, if this was not possible, as much as possible of the tumor in the 3D volume. The woman was asked to remain still during the acquisition of the 3D power Doppler volume. After acquisition the resultant multiplanar display was examined to ensure that a complete volume—or as large a volume as possible—of the ovarian tumor had been captured. Volumes of satisfactory quality were stored on a hard disk for future analysis.
After completion of the ultrasound examination the examiner classified the tumor as benign or malignant (borderline tumors being classified as malignant) on the basis of subjective evaluation of the gray-scale and power Doppler ultrasound findings1, 3. Even in cases of complete uncertainty the examiner had to classify the tumor as being benign or malignant. The examiner also scored the risk of malignancy, a score of 1 meaning that the tumor was judged to be certainly benign, a score of 2 that the tumor was probably benign, a score of 3 that it was impossible to classify the tumor as benign or malignant, a score of 4 that the tumor was probably malignant, and a score of 5 that the tumor was certainly malignant. Tumors given a score of 3 were classified as difficult tumors. In addition, the ultrasound examiner estimated the risk of malignancy on a VAS ranging from 0 to 100 arbitrary units.
Analysis of stored ultrasound volumes was done off-line by one of the authors (L. J.) on a personal computer using the virtual organ computer-aided analysis (VOCAL™) imaging program and the four-dimensional (4D)-view software, version 2.1 (GE Healthcare, Zipf, Austria). The acquired volumes yielded multiplanar views of the tumor in the mid-sagittal, transverse and coronal planes. All calculations were done on these multiplanar images. The longitudinal view was used as the reference image (Figure 1). The rotation steps were 30° resulting in the definition of six contours of the tumor. Contours of the tumor were drawn manually in all six sections using the computer mouse. Once all contours had been drawn, the volume of the tumor was calculated automatically. Using the histogram facility of the VOCAL software, three indices of vascularity were generated: vascularization index (VI), flow index (FI) and vascularization flow index (VFI) (Figure 1). VI is the ratio of color voxels to all voxels in the region of interest expressed as a percentage, and it reflects the density of vessels in the volume analyzed. FI is the sum of weighted color voxels divided by the number of all color voxels in the region of interest, and it reflects the number of blood corpuscles in the vessels of the volume. VFI is the sum of weighted color Doppler voxels divided by all voxels in the region of interest. It reflects both the density of vessels and the number of blood corpuscles flowing in the vessels of the volume10. Having calculated the volume and vascular indices of the whole tumor, a 5-cm3 spherical sample volume was selected using VOCAL from that part of the tumor that appeared to be most vascularized on the basis of subjective evaluation. The size of the sample was chosen arbitrarily. We could have used a smaller sample size than 5 cm3, but had we used a much larger sample size the smallest tumors in our study would not have filled the sample volume. Vascular indices were calculated in the 5-cm3 sample by the VOCAL software as described above (Figure 2).
The results of the ultrasound examinations and those of subjective estimation of the risk of malignancy were compared with those of histological examination of the respective surgical specimens. Staging of malignant tumors was done by the attending physician in accordance with the classification system recommended by the International Federation of Gynecology and Obstetrics11.
The reproducibility of the calculation of 3D flow indices was determined in 25 tumor volumes. The 25 tumors, comprising one borderline tumor, six invasive malignancies and 18 benign tumors, were selected from our datasheet by the last author (L. V.) to reflect the proportion of benign, borderline and malignant tumors in the whole study population. To determine intraobserver reproducibility of the calculation of flow indices one observer (L. J.) analyzed the same tumor volume twice, the second analysis being performed 48 h after the first one. To determine interobserver reproducibility a second observer (P. S.)—unaware of the results of the first observer—analyzed the same tumor volumes as the first observer. The results of the second observer were compared to those of the second analysis of the first observer.
For statistical analysis, primary invasive, borderline and metastatic invasive tumors were all classified as malignant. In the case of bilateral masses, data from the mass with the most complex gray-scale ultrasound morphology were used.
Statistical calculations were undertaken using the Statistical Package for the Social Sciences (SPSS Inc., Chicago, IL, USA, version 12.02). The statistical significance of a possible relationship between the outcome (i.e. malignancy) and clinical variables or ultrasound variables was determined using univariate logistic regression analysis with the likelihood ratio test. Multivariate logistic regression corrected for the size of the largest solid component (mean of three diameters) and wall irregularity was used to build a model including only gray-scale ultrasound variables to predict malignancy. To avoid overfitting, a maximum of three gray-scale ultrasound variables were allowed in the model, i.e. the size of the largest solid component, wall irregularity and one additional gray-scale ultrasound variable. Then we tested the effect of adding 2D power Doppler results (color score and color content of the tumor scan) and 3D power Doppler vascular indices to the best gray-scale model obtained. Only one Doppler variable was allowed to enter the model, i.e. a maximum of four variables were allowed in the model. The likelihood ratio test yielding a P≤0.05 was the criterion for including a variable in a model.
The application of the regression equations to data from each woman gave the probability for that woman to have a malignant tumor, the probability ranging from 0 to 1. Receiver–operating characteristics (ROC) curves were drawn for single predicting variables as well as for regression equations to evaluate their diagnostic ability. The area under the ROC curve and the 95% confidence interval (CI) of this area were calculated. If the lower limit of the CI for the area under the ROC curve was > 0.5, the diagnostic test was considered to have discriminatory potential. The ROC curves were also used to determine the mathematically best cut-off value to predict malignancy for each diagnostic test (single variables as well as logistic regression models), the mathematically best cut-off value being defined as that corresponding to the point on the ROC curve situated furthest away from the reference line. The sensitivity, false positive-rate (1 minus specificity), and positive and negative likelihood ratios (LR) of the mathematically best cut-off value were also calculated. We defined the best diagnostic test as the one with the largest area under the ROC curve, the highest positive LR and the lowest negative LR for the mathematically best cut-off value. Two tailed P-values ≤ 0.05 were considered statistically significant.
Intraobserver reproducibility was expressed as the difference between two measurement results obtained by the same observer. The differences between measured values were plotted against the mean of the two measurements to assess the relationship between the difference and the magnitude of the measurements12. Limits of agreement (mean difference ± 2SD) were calculated as described by Bland and Altman12. Systematic bias between the first and second analysis was determined by calculating the 95% CI of the mean difference (mean difference ± 2 standard errors (SE) of the mean). If zero lay within this interval no bias was assumed to exist between the first and second measurement. Interobserver reproducibility was calculated using the same methods as described above for intra-observer reproducibility. Intra- and interobserver reproducibility were also expressed as intra- and interclass correlation coefficients (intra- and inter-CC)13, variance components being estimated using two-way random analysis of variance (absolute agreement).
The study protocol was approved by the ethics committee of the Medical Faculty of Lund University, Sweden.
Twenty-four of the 131 consecutive women examined were excluded because of an unequivocal ultrasound diagnosis of dermoid cyst (n = 14), tubal disease (n = 7), paraovarian cyst (n = 2), or peritoneal pseudocyst (n = 1). The ultrasound diagnosis was correct in 23 of these 24 cases (one ultrasound diagnosis of dermoid cyst was incorrect, the histopathology of the tumor being endometrioma). One woman was excluded because of power Doppler artifacts.
Among the 106 tumors included there were 79 (75%) benign tumors and 27 (25%) malignancies. Histological diagnoses are presented in Table 1. Family history and demographic background data are shown in Table 2. Women with benign tumors were younger than those with malignant tumors, and a greater proportion used hormone replacement therapy or contraceptive pills.
Table 1. Histopathological diagnoses
Type of tumor
Histopathological diagnosis impossible because of necrosis.
14 epithelial cancers (6 serous adenocarcinomas, 3 endometrioid adenocarcinomas, 4 adenocarcinomas of low differentiation, 1 clear cell cancer mixed with cancer of low differentiation), one granulosa cell tumor, one sex cord stromal cell tumor, one malignant teratoma, one malignant mixed Mullerian tumor.
At least one first degree relative with ovarian cancer, n (%)
At least one first degree relative with breast cancer, n (%)
Personal history of ovarian cancer, n (%)
Personal history of breast cancer, n (%)
Ultrasound characteristics of benign and malignant tumors are shown in Tables 3 and 4. Almost all ultrasound characteristics differed significantly between benign and malignant tumors, the only exceptions being the presence of papillary projections, the presence of shadowing, thickness of septa, and bilaterality. The gray-scale ultrasound variable with the best diagnostic performance was the size of the largest solid component (i.e. mean of three orthogonal diameters) with an area under the ROC curve of 0.96, a sensitivity of 85%, a false positive rate of 5%, a positive LR of 17.0 and a negative LR of 0.16 when using the mathematically best cut-off value, i.e. 31 mm; the 3D power Doppler flow index with the best diagnostic performance was VI in a 5-cm3 sample taken from the most vascularized area of the tumor, its area under the ROC curve being 0.92, its sensitivity 93%, its false positive rate 16%, its positive LR 5.47 and its negative LR 0.08 when using the mathematically best cut-off value, i.e. 10.6% (Table 5).
Table 3. Ultrasound characteristics of benign and malignant tumors; gray-scale ultrasound variables
Higher values than the cut-off value indicate malignancy.
Mean of three orthogonal diameters.
Probability of malignancy = [ez/(1 + ez)] where z = (0.110 × mean diameter of largest solid component in mm) + (0.028 × mean lesion diameter in mm) + (2.15 × wall irregularity coded as 0 or 1) − 7.671.
Probability of malignancy = [ez/(1 + ez)] where z = (0.107 × mean diameter of largest solid component in mm) + (0.033 × mean lesion diameter in mm) + (2.271 × wall irregularity coded as 0 or 1) + (0.043 × color content VAS in the whole tumor) − 10.173.
Probability of malignancy = [ez/(1 + ez)] where z = (0.101 × mean diameter of largest solid component in mm) + (0.032 × mean lesion diameter in mm) + (2.531 × wall irregularity coded as 0 or 1) + (0.169 × FI in biopsy from most vascularized area) − 14.921. 2D, two-dimensional; 3D, three-dimensional; FI, flow index; LR+, positive likelihood ratio; LR−, negative likelihood ratio; ROC curve, receiver–operating characteristics curve; VAS, visual analog scale; VFI, vascularization flow index; VI, vascularization index.
The multivariate logistic regression model including only gray-scale variables that best predicted malignancy is shown in Table 6. It included the mean diameter of the largest solid component, wall irregularity and the mean diameter of the whole tumor, the size of the whole tumor being the only gray-scale variable that added information to a model including the largest solid component and wall irregularity. The color content of the tumor scan—but not the color score—added information to this gray-scale model (Table 6). The only 3D power Doppler variables that added information to the gray-scale model were FI in the whole tumor and FI in the 5-cm3 sample. The two logistic regression models including FI had virtually identical diagnostic performance. The model including FI in the 5-cm3 sample is described in Table 6. The diagnostic performance of the logistic regression models and of subjective estimation of the risk of malignancy by the ultrasound examiner is shown in Table 5. Using subjective evaluation the ultrasound examiner correctly predicted malignancy in 24 (89%) of 27 tumors and benignity in 73 (92%) of 79 tumors (Table 5), 1.e. three malignancies were incorrectly classified as benign (one mucinous borderline tumor, one sex cord stromal cell tumor, and one primary invasive ovarian tumor), and six benign tumors were incorrectly classified as malignant (two serous papillary cystadenomas, one mucinous cystadenoma, one serous cystadenoma, one serous cystadenoma with a mucinous component, and one case of struma ovarii). The multivariate logistic regression model including only gray-scale variables did not misclassify any malignancy but misclassified eight benign tumors as malignant using the mathematically best cut-off value for probability of malignancy. The logistic regression models including the gray-scale variables and FI, or the gray-scale variables and the color content of the tumor scan, also did not misclassify any malignancy, but both misclassified six benign tumors as malignant using the mathematically best cut-off value for probability of malignancy. The case of struma ovarii was misclassified as a malignancy both by subjective evaluation by the ultrasound examiner and by the three logistic regression models.
Table 6. Logistic regression models to predict malignancy
Odds ratio estimates
95% Confidence interval
Mean of three orthogonal diameters.
The lower 95% confidence limit is < 1.0, which is explained by the 95% confidence limits having been calculated by approximation. We have tried to calculate the exact limits but no statistical software managed to do it, the procedure being too complicated. 2D, two-dimensional; 3D, three-dimensional; FI, flow index; VAS, visual analog scale.
The ultrasound examiner misclassified three of the seven difficult tumors, the gray-scale model misclassified two, the model including the color content of the tumor scan misclassified one, and the model including FI in the 5-cm3 sample misclassified one. All misclassified tumors were benign.
The intra- and interobserver reproducibility of calculation of 3D flow indices is shown in Tables 7 and 8. Neither intraobserver nor interobserver differences changed with the magnitude of measurement values. There were no systematic differences between the first and second measurements of the first observer or between the measurements of the two observers. Limits of agreement were wider for measurements in the 5-cm3 sample than for those in the whole tumor, but both intra- and inter-CC values were very high.
Table 7. Intraobserver reproducibility of analysis of volume and vascular indices of tumors
Median (range) of values (n = 50)
Difference between two analyses by the same observer, n = 25
We have shown that 3D power Doppler ultrasound examination can quite reliably discriminate between benign and malignant ovarian tumors, the best 3D power Doppler variables being VI or VFI in a 5-cm3 biopsy taken from the area of the tumor subjectively judged to be most vascularized. The area under the ROC curve for these variables was 0.92 and 0.93, respectively. VI and VFI values below the mathematically best cut-off value decreased the odds of malignancy tenfold whereas values above the best cut-off increased the odds fivefold. The color content of the tumor scan estimated subjectively by the ultrasound examiner was less discriminative than the 3D power Doppler indices, but 3D power Doppler ultrasound examination was not superior to gray-scale imaging. The single best gray-scale ultrasound variable (i.e. the size of the largest solid component of the tumor) had an area under the ROC curve of 0.96, and size of the largest solid component below the mathematically best cut-off value decreased the odds of malignancy five times, while size above the best cut-off increased the odds almost 20-fold.
It is a personal experience of the last author (L. V.) that the most important gray-scale ultrasound variables for distinguishing benign from malignant tumors are the presence of solid components and the presence of any irregularity in the tumor, be it in the inner cyst wall or in the outer contour of the tumor14. Therefore, we constructed a logistic regression model including these two variables, and—to avoid over fitting—only one additional gray-scale variable. This gray-scale model with an area under the ROC curve of 0.98 was not clearly superior to the single best gray-scale variable (i.e. largest solid component with an area under the ROC curve of 0.96) or to the single best 3D power Doppler variable (VI in the most vascularized area of the tumor with an area under the ROC curve of 0.92). Adding 3D power Doppler results to our gray-scale model resulted in a very small improvement in diagnostic performance (area under ROC curve 0.99). In fact, only two additional tumors were correctly classified by the added use of 3D power Doppler. Adding the color content of the tumor scan as estimated subjectively by the ultrasound examiner on a VAS also resulted in two additional tumors being correctly classified.
Our results indicate that diagnostic performance similar to that of a logistic regression model including only gray-scale ultrasound variables (area under ROC curve 0.98) or gray-scale ultrasound variables and 2D or 3D power Doppler variables (area under ROC curve 0.98 and 0.99) can be achieved by subjective evaluation of ultrasound findings by an experienced examiner (area under ROC curve 0.91–0.97 depending on which method of risk estimation is used by the examiner). A comparison between subjective evaluation and our logistic regression models is not quite fair, though, because subjective evaluation must be considered to have been tested prospectively, whereas the performance of the logistic regression models was tested on the population where the models were created. Therefore the performance of the models was almost certainly overestimated. Of course, our results are heavily dependent upon the tumors included. However, the tumors included are likely to be representative of those adnexal tumors that are currently considered appropriate to surgically remove. The reason why we excluded lesions with a certain ultrasound diagnosis of paraovarian cyst, peritoneal cyst or tubal disease is that we wanted to include only lesions suspected to be of ovarian origin at ultrasound examination. The reason why we excluded lesions with a certain ultrasound diagnosis of dermoid cyst is that dermoid cysts usually cause no diagnostic problems at gray-scale ultrasound examination, and so the use of power Doppler ultrasound would probably be unnecessary in most cases of dermoid cyst. Moreover, we had the experience before the start of the study that power Doppler artifacts during acquisition of a volume was a common problem in dermoid cysts. We are aware that our sample size is relatively small, and that our models would need to be tested prospectively. In particular we would like to test them in difficult tumors, i.e. in tumors that are considered very difficult to classify as benign or malignant by an experienced ultrasound examiner (see also below).
Subjective evaluation of the color content of the tumor scan has been shown to be superior to all other 2D color/power Doppler ultrasound methods for discriminating between benign and malignant tumors15. Subjective evaluation of the color content of the tumor scan should yield information similar to that obtained by 3D power Doppler imaging, but it has the disadvantage of being purely subjective and therefore difficult to reproduce, even though its reproducibility has been found to be sufficient for clinical use16. Moreover, because subjective evaluation of the color content of the tumor scan cannot be done without gray-scale ultrasound information, the subjective estimation of the color content of a tumor scan is very likely to be biased by the gray-scale information. Using 3D power Doppler ultrasound the color content of the tumor scan can be determined objectively, even though the selection of the most vascularized area (i.e. selection of the spherical 5-cm3 sample site) is necessarily subjective. The person performing the analyses of the 3D volumes also cannot be completely blinded to gray-scale ultrasound findings, because the 3D power Doppler volumes to be analyzed do contain gray-scale information. However, we do not believe that the gray-scale ultrasound information in the volume introduces bias when the sample site is selected, because when selecting the sample site attention is paid to vascularization and not to gray-scale ultrasound morphology. We are encouraged to find that our results using the objective 3D power Doppler method confirm those previously obtained using subjective estimation of the color content of the tumor scan during 2D color/power Doppler scanning, namely that malignant adnexal tumors appear to be more richly vascularized than benign adnexal tumors17. Both subjective evaluation of the color content of tumor scans and acquiring good 3D power Doppler volumes of tumors require experience. A disadvantage of the 3D power Doppler method is that it is quite time consuming to calculate the color content of a tumor. Calculating the color content of a 5-cm3 spherical sample is slightly less time consuming (approximately 2–3 min after having gained experience with the method) than drawing the contours of a whole tumor to calculate the color content of the whole tumor (approximately 5 min). This is why we chose to present the logistic regression model including FI in a 5-cm3 sample rather than the model including FI in the whole tumor, the two models having virtually identical diagnostic performance. On the other hand, reproducibility was slightly better (less wide limits of agreement) for analysis of power Doppler signals in the whole tumor volume than in a 5-cm3 sample. This could be an argument for choosing the model including FI in the whole tumor rather than the model including FI in a 5-cm3 sample.
An experienced ultrasound examiner can easily and correctly classify most tumors as benign or malignant by using gray-scale ultrasound examination alone1, 3. For these easily classifiable tumors the added use of more sophisticated diagnostic methods, such as 3D power Doppler ultrasound examination, is not necessary. However, some 10% of tumors are very difficult to classify even for experienced examiners18. Attempts to create logistic regression models including clinical variables, CA 125 values and 2D gray-scale and power Doppler ultrasound variables, for reliable discrimination between benign and malignant difficult adnexal tumors, has failed18. It is for this group of difficult tumors that additional diagnostic methods are needed. It would be worth exploring in a much larger study sample than the current one whether the sophisticated 3D power Doppler ultrasound technique could help for discrimination between benign and malignant difficult tumors.
This study was supported by the Swedish Medical Research Council (grants nos. K2001-72X-11605-06A, K2002-72X-11605-07B and K2004-73X-11605-09A), two governmental grants (Landstingsfinansierad regional forskning (Region Skåne and ALF-medel)), and funds administered by Malmö University Hospital.