Evaluating surrogate variables for improving microarray multiple testing inference



The use of surrogate variables has been proposed as a means to capture, for a given observed set of data, sources driving the dependency structure among high-dimensional sets of features and remove the effects of those sources and their potential negative impact on simultaneous inference. In this article we illustrate the potential effects of latent variables on testing dependence and the resulting impact on multiple inference, we briefly review the method of surrogate variable analysis proposed by Leek and Storey (PNAS 2008; 105:18718–18723), and assess that method via simulations intended to mimic the complexity of feature dependence observed in real-world microarray data. The method is also assessed via application to a recent Merck microarray data set. Both simulation and case study results indicate that surrogate variable analysis can offer a viable strategy for tackling the multiple testing dependence problem when the features follow a potentially complex correlation structure, yielding improvements in the variability of false positive rates and increases in power. Copyright © 2010 John Wiley & Sons, Ltd.