The effectiveness of stemming for natural-language access to Slovene textual data
Article first published online: 4 JAN 1999
Copyright © 1992 John Wiley & Sons, Inc.
Journal of the American Society for Information Science
Volume 43, Issue 5, pages 384–390, June 1992
How to Cite
Popovič, M. and Willett, P. (1992), The effectiveness of stemming for natural-language access to Slovene textual data. J. Am. Soc. Inf. Sci., 43: 384–390. doi: 10.1002/(SICI)1097-4571(199206)43:5<384::AID-ASI6>3.0.CO;2-L
- Issue published online: 4 JAN 1999
- Article first published online: 4 JAN 1999
- Manuscript Revised: 18 NOV 1991
- Manuscript Accepted: 18 NOV 1991
- Manuscript Received: 11 SEP 1991
There have been several studies of the use of stemming algorithms for conflating morphological variants in free-text retrieval systems. Comparison of stemmed and nonconflated searches suggests that there are no significant increases in the effectiveness of retrieval when stemming is applied to English-language documents and queries. This article reports the use of stemming on Slovene-language documents and queries, and demonstrates that the use of an appropriate stemming algorithm results in a large, and statistically significant, increase in retrieval effectiveness when compared with nonconflated processing; similar comments apply to the use of manual, right-hand truncation. A comparison is made with stemming of English versions of the same documents and queries and it is concluded that the effectiveness of a stemming algorithm is determined by the morphological complexity of the language that it is designed to process. © 1992 John Wiley & Sons, Inc.