Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves
Article first published online: 15 APR 2011
Copyright © 2011 John Wiley & Sons, Ltd.
Statistics in Medicine
Volume 30, Issue 14, pages 1751–1760, 30 June 2011
How to Cite
Wang, Y., Chen, H., Schwartz, T., Duan, N., Parcesepe, A. and Lewis-Fernández, R. (2011), Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves. Statist. Med., 30: 1751–1760. doi: 10.1002/sim.4246
- Issue published online: 2 JUN 2011
- Article first published online: 15 APR 2011
- Manuscript Accepted: 15 FEB 2011
- Manuscript Received: 8 JUN 2010
- disease diagnostic test;
- early screening;
- hierarchical variable selection;
- best subset selection
In many clinical settings, a commonly encountered problem is to assess the accuracy of a screening test for early detection of a disease. In this article, we develop hierarchical all-subset variable selection methods to assess and improve a psychosis screening test designed to detect psychotic patients in primary care clinics. We select items from an existing screener to achieve best prediction accuracy based on a gold standard psychosis status diagnosis. The existing screener has a hierarchical structure: the questions fall into five domains, and there is a root question followed by several stem questions in each domain. The statistical question lies in how to implement the hierarchical structure in the screening items when performing variable selection such that when a stem question is selected in the screener, its root question should also be selected. We develop an all-subset variable selection procedure that takes into account the hierarchical structure in a questionnaire. By enforcing a hierarchical rule, we reduce the dimensionality of the search space, thereby allowing for fast all-subset selection, which is usually computationally prohibitive. To focus on prediction performance of a selected model, we use area under the ROC curve as the criterion to rank all admissible models. We compare the procedure to a logistic regression-based approach and a stepwise regression that ignores the hierarchical structure. We use the procedure to construct a psychosis screening test to be used at a primary care clinic that will optimally screen low-income, Latino psychotic patients for further specialty referral. Copyright © 2011 John Wiley & Sons, Ltd.