Editor's Summary

Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods grouped documents having similar collections of words, but current software can provide far greater accuracy. Text-mining tools can spot recognizable entities as potential taxonomy terms or top-level categories. Developing a taxonomy further to coordinate with auto-classification software requires appreciation of how the software works, whether it uses an approach based on lexical analysis, rules for word co-occurrence or machine learning with predictive analysis. The taxonomy model is typically hierarchical with term specificity dictated by the end user's need for detail. Synonyms and variants are redirected to the term for classification. The classification tool must be configured to be consistent with the typical document format and style of the collection. Testing the classification scheme, critical to reveal inaccuracies and omissions, is an iterative process expanding from a stable test set to validation on a large corpus before final implementation.