### Abstract

- Top of page
- Abstract
- 1. Introduction
- 1.1. Background
- 1.2. Statistical issues
- 1.3. Data analysis tools in molecular classification
- 2. Modelling
- 2.1. A statistical definition of abnormal expression
- 2.2. Distributional assumptions
- 2.3. Bayesian hierarchical analysis
- 3. Molecular profiles
- 4. Molecular analysis of ductal breast cancer
- 4.1. Data and preprocessing
- 4.2. Estimation
- 4.3. Visualization of profile information: genome-wide approaches
- 4.4. An iterative approach examining small subsets of candidate genes
- 4.5. Visualization of profile probabilities
- 5. Gene interactions
- 6. Discussion
- Acknowledgements
- References

**Summary.** Genome-wide measurement of gene expression is a promising approach to the identification of subclasses of cancer that are currently not differentiable, but potentially biologically heterogeneous. This type of molecular classification gives hope for highly individualized and more effective prognosis and treatment of cancer. Statistically, the analysis of gene expression data from unclassified tumours is a complex hypothesis-generating activity, involving data exploration, modelling and expert elicitation. We propose a modelling framework that can be used to inform and organize the development of exploratory tools for classification. Our framework uses latent categories to provide both a statistical definition of differential expression and a precise, experiment-independent, definition of a molecular profile. It also generates natural similarity measures for traditional clustering and gives probabilistic statements about the assignment of tumours to molecular profiles.