Adaptive trimmed t-statistics for identifying predominantly high expression in a microarray experiment



Often, interesting candidate tumor markers are not only genes that show homogeneously higher expression (HHE) in tumor samples compared to control samples, but also genes with only predominantly higher expression (PHE), i.e. genes which exhibit higher expression in at least 80 per cent of tumor samples. Standard parametric test statistics used in the analysis of microarray experiments may fail with PHE as a consequence of the mixture of distributions present in the tumor group. As alternative we consider trimmed t-statistics which compare group mean values after removing outliers in each group. The trimming proportion can be chosen adaptively, either based on a boxplot outlier detection rule or by optimization over a series of tests with varying trimming proportions. The trimmed t-statistics can be plugged into the ‘significance analysis of microarrays’ (SAM) procedure, yielding the modified boxplot rule test (modBox) and the modified optimization test (modOpt), respectively. By means of simulation of microarray experiments, we show that modOpt is superior to contenders in detecting PHE, while there is only little loss in efficiency under HHE compared to SAM. Analysis of a real microarray experiment revealed that, out of nearly 29 000 genes, about 417 genes exhibiting PHE are detected by modOpt but missed by SAM. Copyright © 2010 John Wiley & Sons, Ltd.