Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves



In many clinical settings, a commonly encountered problem is to assess the accuracy of a screening test for early detection of a disease. In this article, we develop hierarchical all-subset variable selection methods to assess and improve a psychosis screening test designed to detect psychotic patients in primary care clinics. We select items from an existing screener to achieve best prediction accuracy based on a gold standard psychosis status diagnosis. The existing screener has a hierarchical structure: the questions fall into five domains, and there is a root question followed by several stem questions in each domain. The statistical question lies in how to implement the hierarchical structure in the screening items when performing variable selection such that when a stem question is selected in the screener, its root question should also be selected. We develop an all-subset variable selection procedure that takes into account the hierarchical structure in a questionnaire. By enforcing a hierarchical rule, we reduce the dimensionality of the search space, thereby allowing for fast all-subset selection, which is usually computationally prohibitive. To focus on prediction performance of a selected model, we use area under the ROC curve as the criterion to rank all admissible models. We compare the procedure to a logistic regression-based approach and a stepwise regression that ignores the hierarchical structure. We use the procedure to construct a psychosis screening test to be used at a primary care clinic that will optimally screen low-income, Latino psychotic patients for further specialty referral. Copyright © 2011 John Wiley & Sons, Ltd.