• Biological pathways;
  • Hierarchical Bayesian models;
  • Mixture priors

Summary.  We propose a hierarchical Bayesian model for analysing gene expression data to identify pathways differentiating between two biological states (e.g. cancer versus non-cancer). Finding significant pathways can improve our understanding of normal and pathological processes and can lead to more effective treatments. Our method, Bayesian gene set analysis, evaluates the statistical significance of a specific pathway by using the posterior distribution of its corresponding hyperparameter. We apply Bayesian gene set analysis to a gene expression microarray data set on 50 cancer cell lines, of which 33 have a known p53 mutation and the remaining are p53 wild type, to identify pathways that are associated with the mutational status in the gene p53. We identify several significant pathways with strong biological connections. We show that our approach provides a natural framework for incorporating prior biological information, and it produces the best overall performance in terms of correctly identifying significant pathways compared with several alternative methods.