Get access

Biclustering scatter plots using data depth measures



Biclustering is desirable over traditional one-dimensional clustering, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing biclustering methods can only deal with a data matrix of scalars. In this paper, we introduce a biclustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on ‘data depth’ to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with a searching algorithm to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the biclusters. We demonstrate the utility and power of our proposed biclustering method through simulation studies and application to a microbe–host interaction study. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 6: 102–115, 2013