SEARCH

SEARCH BY CITATION

Keywords:

  • Chen–Stein theorem;
  • multiple testing;
  • rank test;
  • stochastic ordering;
  • U-statistic;
  • union-intersection test

Abstract

New statistical procedures are introduced to analyse typical microRNA expression data sets. For each separate microRNA expression, the null hypothesis to be tested is that there is no difference between the distributions of the expression in different groups. The test statistics are then constructed having certain type of alternatives in mind. To avoid strong (parametric) distributional assumptions, the alternatives are formulated using probabilities of different orders of pairs or triples of observations coming from different groups, and the test statistics are then constructed using corresponding several-sample U-statistics, natural estimates of these probabilities. Classical several-sample rank test statistics, such as the Kruskal–Wallis and Jonckheere–Terpstra tests, are special cases in our approach. Also, as the number of variables (microRNAs) is huge, we confront a serious simultaneous testing problem. Different approaches to control the family-wise error rate or the false discovery rate are shortly discussed, and it is shown how the Chen–Stein theorem can be used to show that family-wise error rate can be controlled for cluster-dependent microRNAs under weak assumptions. The theory is illustrated with an analysis of real data, a microRNA expression data set on Finnish (aggressive and non-aggressive) prostate cancer patients and their controls.