Get access

Microarray-based prediction of Parkinson's disease using clinical data as additional response variables



Recent discoveries and developments in the field of genomics have led to the commercialization of novel diagnostic devices for studying disease or estimating therapeutic outcomes in individual patients. With this emerging field, the emphasis is shifting to integration of clinical research into product development. Data acquisition is primary in the initial exploratory phase of product development, and during the process of sample collection and data generation in clinical microarray studies, great amounts of additional information, such as demographic, clinical, and study design variables associated with the data, are often accumulated and made available. Including additional information in classification has been addressed in many different ways. However, in previous studies, the additional information have consistently been treated as extra predictors, which can be a problem for future prediction if such information are not available or collectable for the new samples. We instead propose to adopt a method called canonical partial least squares, which for our purpose, only uses the additional information at the model building stage to stabilize the construction of a classifier for disease status from microarray data. The canonical partial least squares method is compared with regular partial least squares for the classification of Parkinson's disease from gene expression in peripheral blood samples and also through computer simulations. The present study showed that including clinical data in the model building produces simpler and more stable models for prediction of Parkinson's disease from gene expression data. Copyright © 2012 John Wiley & Sons, Ltd.

Get access to the full text of this article