Comparison of ‘pattern recognition’ and logistic regression models for discrimination between benign and malignant pelvic masses: a prospective cross validation

Authors


Abstract

Objectives

To test prospectively the diagnostic performance of two logistic regression models for calculation of individual risk of malignancy in adnexal tumors (the ‘Tailor model’ and the ‘Timmerman model’), and to compare them to that of ‘pattern recognition’ (subjective evaluation of the gray-scale ultrasound image and color Doppler ultrasound examination).

Design

Consecutive women with a pelvic mass judged clinically to be of adnexal origin underwent preoperative ultrasound examination including color and spectral Doppler examination. The same examination techniques and definitions as those used in the studies in which the logistic regression models had been created were used. The Tailor model was tested in 133 women (35 of whom had a malignancy) and the Timmerman model in 82 women (29 of whom had a malignancy). A subset of 79 women (28 of whom had a malignancy) was used to compare the performance of the Tailor model and the Timmerman model by calculating and comparing the areas under the receiver operating characteristics curves of the two models. Sensitivity and specificity with regard to malignancy were calculated for all three methods.

Results

Pattern recognition performed better than the two logistic regression models (sensitivity around 85%, specificity around 90%). Using a risk of malignancy of > 50% to indicate malignancy (as suggested in the original publications), the sensitivity of the Tailor model was 69% and the specificity 88% (n = 133). The corresponding values for the Timmerman model were 62% and 79% (n = 82). The receiver operating characteristics curves showed the two logistic regression models to have similar diagnostic properties (area under the curve, 0.87 vs. 0.84; P = 0.25; n = 79). The diagnostic performance of the mathematical models was much poorer in this study than in those in which the models had been created.

Conclusion

The poor diagnostic performance of the mathematical models can probably be explained by subtle differences in definitions and examination technique and by differences between the original tumor populations and the study population. For mathematical models to be generally useful, they probably need to be created on the basis of a very large number of tumors, and the variables in the model must be unequivocally defined and the examination technique meticulously standardized. Copyright © 2001 International Society of Ultrasound in Obstetrics and Gynecology

Ancillary