Tooling the aggregator's workbench: Metadata visualization through statistical text analysis



Digital library interoperability efforts have succeeded in making large-scale aggregation of digital collections technically feasible, to the extent that aggregations have become critical organizational tools in the data universe. As diverse digital collections are increasingly unified into aggregations data heterogeneity presents burgeoning challenges to aggregators. Comprehension of massive aggregates and efficient metadata quality analysis are aggregator imperatives that rely on scalable evaluation techniques. This paper describes novel applications of visualization techniques, based on statistical text analysis, to the evaluation and maintenance of a large-scale aggregation of descriptive, cultural heritage metadata. These techniques, and their implementation as part of an aggregator's workbench, provide new, administrator-oriented perspectives on metadata aggregations, in order to improve aggregators' capacity to evaluate metadata quality and topical coverage in a scalable way.