Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules

Authors

  • Alberto Roverato,

    Corresponding author
    1. Department of Statistical Sciences, Università di Bologna, Bologna, Italy
      email: alberto.roverato@unibo.it
    Search for more papers by this author
  • F. Marta L. Di Lascio

    Corresponding author
    1. Department of Statistical Sciences, Università di Bologna, Bologna, Italy
      email: francesca.dilascio@unibo.it
    Search for more papers by this author

email:alberto.roverato@unibo.it

email:francesca.dilascio@unibo.it

Abstract

Summary Clustering methods are widely used in the analysis of microarray data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover coregulated genes because it has been postulated that genes targeted by the same transcription factors tend to show similar expression patterns. We focus on agglomerative hierarchical clustering and consider the problem of choosing a dissimilarity measure on the basis of its ability to identify functional modules consisting of a transcription factor and the associated target genes. We first propose two criteria that constitute a theoretical framework for assessing the adequacy and comparing different dissimilarity measures. We show that the proposed criteria allow one to gain insight into the behavior of dissimilarity measures and lead to a ranking of some of the most commonly used dissimilarity measures. Next, we introduce two dissimilarity measures based on the Wilks' Λ statistic and show that, according to the above criteria, they have better performance than the other considered measures. The theoretical results are supported by an applied analysis on both simulated and real data.

Ancillary