Method for evaluation of stemming algorithms based on error counting
Article first published online: 7 DEC 1998
Copyright © 1996 John Wiley & Sons, Inc.
Journal of the American Society for Information Science
Volume 47, Issue 8, pages 632–649, August 1996
How to Cite
Paice, C. D. (1996), Method for evaluation of stemming algorithms based on error counting. J. Am. Soc. Inf. Sci., 47: 632–649. doi: 10.1002/(SICI)1097-4571(199608)47:8<632::AID-ASI8>3.0.CO;2-U
- Issue published online: 7 DEC 1998
- Article first published online: 7 DEC 1998
- Manuscript Accepted: 6 SEP 1995
- Manuscript Revised: 8 MAR 1995
- Manuscript Received: 21 DEC 1993
- Lancaster University Computing Department
In most previous studies, the effectiveness of stemming algorithms has been compared by determining the retrieval performance for various experimental test collections. The present work assesses performance by counting the number of identifiable errors during the stemming of words from various text samples. This entails manual grouping of the words in each sample; software has been developed to facilitate this. After grouping, the words are stemmed and indices are then computed which represent the rate of understemming and overstemming. Results are presented for three stemmers (Lovins, Porter, and Paice/Husk), in each case using three distinct text samples. Although the results are not entirely clear cut, it appears that the Lovins stemmer is inferior to the other two in terms of general accuracy. The way in which the indices vary with the size of the text sample is also investigated. © 1996 John Wiley & Sons, Inc.