Classification of samples into two or more ordered populations with application to a cancer trial

Authors


M. A. Fernández, Departamento de Estadística e Investigación Operativa. C/Prado de la Magdalena s/n. Universidad de Valladolid, 47005 Valladolid, Spain.

E-mail: miguelaf@eio.uva.es

Abstract

In many applications, especially in cancer treatment and diagnosis, investigators are interested in classifying patients into various diagnosis groups on the basis of molecular data such as gene expression or proteomic data. Often, some of the diagnosis groups are known to be related to higher or lower values of some of the predictors. The standard methods of classifying patients into various groups do not take into account the underlying order. This could potentially result in high misclasiffication rates, especially when the number of groups is larger than two.

In this article, we develop classification procedures that exploit the underlying order among the mean values of the predictor variables and the diagnostic groups by using ideas from order-restricted inference. We generalize the existing methodology on discrimination under restrictions and provide empirical evidence to demonstrate that the proposed methodology improves over the existing unrestricted methodology. The proposed methodology is applied to a bladder cancer data set where the researchers are interested in classifying patients into various groups. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary