A SATS algorithm for jointly identifying multiple differentially expressed gene sets



A gene set in DNA microarrays is a group of genes that share a common biological function, chromosomal location, or regulation. This paper discusses the problem of jointly identifying multiple differentially expressed gene sets associated with a phenotype of interest from many hundreds of pre-defined gene sets in a microarray experiment. We propose a null hypothesis that any group of gene sets from the experiment is not differentially expressed. The hypothesis is applicable to a real microarray experiment, where only a fraction of gene sets examined in the experiment are differentially expressed. To test this hypothesis, we provide an algorithm called set association for tail strength (SATS). SATS assigns the tail-strength statistic (TS) to each gene set to measure differential expression that is related to the phenotype of interest, combines the statistics into an overall association measure of multiple gene sets by utilizing a set-association method, and then calculates the significance of the overall measure by conducting sample permutations. SATS performs a simultaneous significance test on several gene sets, while controlling the Type I error rate. As multiple gene sets work together toward the significance, SATS can capture correlations across gene sets that should be considered in assessing joint statistical significance. Copyright © 2011 John Wiley & Sons, Ltd.