Provision of the DDSM mammography metadata in an accessible format




The Digital Database for Screening Mammography (DDSM) is the largest publicly available resource for mammographic image analysis research and has been used extensively in the past for computer assisted diagnosis (CADx) studies. However, the database has not been searchable for a specific kind of lesion, which rendered the case selection process in past studies oftentimes arbitrary. Therefore, the authors want to provide the complete metadata of the DDSM in an accessible format.


The authors semiautomatically transformed the data available at into table format. The 1769 cases (914 from cancer volumes, 855 from benign volumes) comprise 1220 mass lesions (578 benign, 642 malignant) and 859 calcifications (433 benign, 426 malignant). Additionally, 694 normal cases were processed to allow for matching according to age and breast density.


The authors provide the entire DDSM metadata (for benign, malignant, and normal cases) as tab-delimited text files [see supplementary material at E-MPHYA6-41-006405 for DDSM metadata].


The data provided make the case selection for future studies using the DDSM reproducible. Furthermore, it may serve as a validation dataset for CADx approaches using the BI-RADS lexicon.