Research Article
On the complexity of Rocchio's similarity-based relevance feedback algorithm
Article first published online: 31 MAY 2007
DOI: 10.1002/asi.20612
Copyright © 2007 Wiley Periodicals, Inc., A Wiley Company
Issue

Journal of the American Society for Information Science and Technology
Volume 58, Issue 10, pages 1392–1400, August 2007
Additional Information
How to Cite
Chen, Z. and Fu, B. (2007), On the complexity of Rocchio's similarity-based relevance feedback algorithm. J. Am. Soc. Inf. Sci., 58: 1392–1400. doi: 10.1002/asi.20612
Publication History
- Issue published online: 19 JUL 2007
- Article first published online: 31 MAY 2007
- Manuscript Revised: 16 OCT 2006
- Manuscript Accepted: 16 OCT 2006
- Manuscript Received: 12 SEP 2005
- Abstract
- Article
- References
- Cited By
Abstract
Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d2(log d + log n)) over the discretized vector space {0,…, n − 1}d, when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier
over {0,…, n − 1}d can be improved to, at most, 1 + 2k (n − 1) (log d − log(n − 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound
on its learning complexity over the Boolean vector space {0, 1}d.

1532-2890/asset/olbannerleft.gif?v=1&s=d833098325c9f1060bcbee51adf276c155608167)
1532-2890/asset/olbannercenter.gif?v=1&s=661179918edb4fa732edfd3408eb050a6ce87809)
1532-2890/asset/olbannerright.gif?v=1&s=1ef8a363944134c502cbffa1937878a71b4cc635)