Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments




Summary In microarray screening for differentially expressed genes using multiple testing, assessment of power or sample size is of particular importance to ensure that few relevant genes are removed from further consideration prematurely. In this assessment, adequate estimation of the effect sizes of differentially expressed genes is crucial because of its substantial impact on power and sample-size estimates. However, conventional methods using top genes with largest observed effect sizes would be subject to overestimation due to random variation. In this article, we propose a simple estimation method based on hierarchical mixture models with a nonparametric prior distribution to accommodate random variation and possible large diversity of effect sizes across differential genes, separated from nuisance, nondifferential genes. Based on empirical Bayes estimates of effect sizes, the power and false discovery rate (FDR) can be estimated to monitor them simultaneously in gene screening. We also propose a power index that concerns selection of top genes with largest effect sizes, called partial power. This new power index could provide a practical compromise for the difficulty in achieving high levels of usual overall power as confronted in many microarray experiments. Applications to two real datasets from cancer clinical studies are provided.