Get access

Improving the performance of the iterative signature algorithm for the identification of relevant patterns

Authors

  • A. Freitas,

    Corresponding author
    1. Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal
    2. CIDMA, University of Aveiro, 3810-193 Aveiro, Portugal
    • Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal
    Search for more papers by this author
  • V. Afreixo,

    1. Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal
    2. CIDMA, University of Aveiro, 3810-193 Aveiro, Portugal
    Search for more papers by this author
  • M. Pinheiro,

    1. Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal
    2. Biocant, Bioinformatics Unit, 3060-197 Cantanhede, Portugal
    Search for more papers by this author
  • J. L. Oliveira,

    1. Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal
    2. IEETA, University of Aveiro, 3810-193 Aveiro, Portugal
    Search for more papers by this author
  • G. Moura,

    1. Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal
    2. CESAM, University of Aveiro, 3810-193 Aveiro, Portugal
    Search for more papers by this author
  • M. Santos

    1. Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal
    2. CESAM, University of Aveiro, 3810-193 Aveiro, Portugal
    Search for more papers by this author

Abstract

The iterative signature algorithm (ISA) has become very attractive to detect co-regulated genes from microarray data matrices and can be a useful tool for the identification of similar patterns in many other kinds of numerical data matrices. Nevertheless, its algorithmic strategy exhibits some limitations since it is based on statistical behavior of the average and considers averages weighted by scores not necessarily positive. Hence, we propose to take the median instead of the average and to use absolutes scores in ISA's structure. Furthermore, a generalized function is also introduced in the algorithm in order to improve its algorithmic strategy for detecting high value or low value biclusters. The effects of these simple modifications on the performance of the biclustering algorithm are evaluated through an experimental comparative study involving synthetic data sets and real data from the organism Saccharomyces cerevisiae. The experimental results show that the proposed variations of ISA outperform the original version in many situations. Absolute scores in ISA are shown to be essential for the correct interpretation of the biclusters found by the algorithm. The median instead of the average turns the biclustering algorithm more resilient to outliers in the data sets. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 71–83 2011

Get access to the full text of this article

Ancillary