Research Article
Eigenvalue-based model selection during latent semantic indexing
Article first published online: 5 MAY 2005
DOI: 10.1002/asi.20188
Copyright © 2005 Wiley Periodicals, Inc.
Issue

Journal of the American Society for Information Science and Technology
Volume 56, Issue 9, pages 969–988, July 2005
Additional Information
How to Cite
Efron, M. (2005), Eigenvalue-based model selection during latent semantic indexing. J. Am. Soc. Inf. Sci., 56: 969–988. doi: 10.1002/asi.20188
Publication History
- Issue published online: 3 JUN 2005
- Article first published online: 5 MAY 2005
- Manuscript Revised: 8 JUL 2004
- Manuscript Accepted: 8 JUL 2004
- Manuscript Received: 31 OCT 2003
- Abstract
- Article
- References
- Cited By
Abstract
In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals on these “null” eigenvalues. The technique amounts to a series of nonparametric hypothesis tests on the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators on six standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs well, predicting the best values of k on 3 of 12 observations, with good predictions on several others, and never offering the worst estimate of optimal dimensionality.

1532-2890/asset/olbannerleft.gif?v=1&s=d833098325c9f1060bcbee51adf276c155608167)
1532-2890/asset/olbannercenter.gif?v=1&s=661179918edb4fa732edfd3408eb050a6ce87809)
1532-2890/asset/olbannerright.gif?v=1&s=1ef8a363944134c502cbffa1937878a71b4cc635)