When does combining markers improve classification performance and what are implications for practice?


  • Supporting information may be found in the online version of this article.

Correspondence to: Aasthaa Bansal, Department of Biostatistics, University of Washington, Campus Mail Stop 359461, Seattle, WA 98195, U.S.A.

E-mail: abansal@uw.edu


When an existing standard marker does not have sufficient classification accuracy on its own, new markers are sought with the goal of yielding a combination with better performance. The primary criterion for selecting new markers is that they have good performance on their own and preferably be uncorrelated with the standard. Most often linear combinations are considered. In this paper, we investigate the increment in performance that is possible by combining a novel continuous marker with a moderately performing standard continuous marker under a variety of biologically motivated models for their joint distribution. We find that an uncorrelated continuous marker with moderate performance on its own usually yields only minimally improved performance. We identify other settings that lead to large improvements, including a novel marker that has very poor performance on its own but is highly correlated with the standard and a novel marker with poor to moderate performance that is highly correlated with the standard but only in one class category. These results suggest changing current strategies for identifying markers to be included in panels for possible combination. Using simulated and real datasets, we examine the merits of a broadened strategy that selects panels of markers as candidates on the basis of their joint performance with existing markers, compared with the standard strategy that selects markers on the basis of their marginal performance. We find that a broadened strategy can be fruitful but necessitates using studies with large numbers of subjects. Copyright © 2013 John Wiley & Sons, Ltd.