SEARCH

SEARCH BY CITATION

Keywords:

  • banded matrices;
  • seriation;
  • matrix re-ordering;
  • formal concept analysis;
  • biclustering;
  • coclustering

Abstract

Binary data occurs often in real-world applications ranging from social networks to bioinformatics. As such, extracting patterns from binary data has been a fundamental task of data mining. Recently, the utility of banded structures in binary matrices has been pointed out for applications such as paleontology, bioinformatics, and social networking. A binary matrix has a banded structure if both the rows and columns can be permuted so that the 1s exhibit a staircase pattern down the rows, along the leading diagonal. Natural interpretations of banded structures include overlapping communities in social networks, patterns of species occurring in spatially correlated sites, and overlapping roles of genes in various diseases. In this paper, we show the correspondence between formal concept analysis and banded structure; as a direct result of this correspondence a novel framework for discovering banded structures is presented. Utilizing the framework, the MMBS algorithm (mine maximally banded submatrices) is developed. The current state-of-the-art algorithm, MBS, only allows for the discovery of a single band and assumes a fixed-column permutation. On the other hand, MMBS facilitates the discovery of multiple bands that may possibly be overlapping or segmented. Our experimental results, presented here, clearly indicate the advantage of MMBS over MBS with both, synthetic and real datasets. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 431-445, 2010