Research Article
Linear time series models for term weighting in information retrieval
Article first published online: 23 MAR 2010
DOI: 10.1002/asi.21315
© 2010 ASIS&T
Issue

Journal of the American Society for Information Science and Technology
Volume 61, Issue 7, pages 1299–1312, July 2010
Additional Information
How to Cite
Efron, M. (2010), Linear time series models for term weighting in information retrieval. Journal of the American Society for Information Science and Technology, 61: 1299–1312. doi: 10.1002/asi.21315
Publication History
- Issue published online: 14 JUN 2010
- Article first published online: 23 MAR 2010
- Manuscript Revised: 6 JAN 2010
- Manuscript Accepted: 6 JAN 2010
- Manuscript Received: 29 SEP 2009
- Abstract
- Article
- References
- Cited By
Abstract
Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.

1532-2890/asset/olbannerleft.gif?v=1&s=d833098325c9f1060bcbee51adf276c155608167)
1532-2890/asset/olbannercenter.gif?v=1&s=661179918edb4fa732edfd3408eb050a6ce87809)
1532-2890/asset/olbannerright.gif?v=1&s=1ef8a363944134c502cbffa1937878a71b4cc635)