• Gene expression;
  • Microarrays;
  • Ranking;
  • Sample size;
  • Selection

Summary We develop formulae to calculate sample sizes for ranking and selection of differentially expressed genes among different clinical subtypes or prognostic classes of disease in genome-wide screening studies with microarrays. The formulae aim to control the probability that a selected subset of genes with fixed size contains enough truly top-ranking informative genes, which can be assessed on the basis of the distribution of ordered statistics from independent genes. We provide strategies for conservative designs to cope with issues of unknown number of informative genes and unknown correlation structure across genes. Application of the formulae to a clinical study for multiple myeloma is given.