Special Section: Taxonomies in Practice
Building a taxonomy for auto-classification
Article first published online: 14 DEC 2012
DOI: 10.1002/bult.2013.1720390210
Copyright © 2013 American Society for Information Science and Technology
Issue
1550-8366/asset/cover.gif?v=1&s=140c1f347768bec8756ba3b6ffa7622120fc7e0b)
Bulletin of the American Society for Information Science and Technology
Volume 39, Issue 2, pages 34–38, December/January 2013
Additional Information
How to Cite
Pohs, W. (2013), Building a taxonomy for auto-classification. Bul. Am. Soc. Info. Sci. Tech., 39: 34–38. doi: 10.1002/bult.2013.1720390210
Publication History
- Issue published online: 14 DEC 2012
- Article first published online: 14 DEC 2012
Abstract
Editor's Summary
Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods grouped documents having similar collections of words, but current software can provide far greater accuracy. Text-mining tools can spot recognizable entities as potential taxonomy terms or top-level categories. Developing a taxonomy further to coordinate with auto-classification software requires appreciation of how the software works, whether it uses an approach based on lexical analysis, rules for word co-occurrence or machine learning with predictive analysis. The taxonomy model is typically hierarchical with term specificity dictated by the end user's need for detail. Synonyms and variants are redirected to the term for classification. The classification tool must be configured to be consistent with the typical document format and style of the collection. Testing the classification scheme, critical to reveal inaccuracies and omissions, is an iterative process expanding from a stable test set to validation on a large corpus before final implementation.
