• graph partitioning;
  • kernel k-means for cluster discovery and network segmentation;
  • recursive kernel trick;
  • sparsity;
  • computational saving


Kernel K-means has been a powerful unsupervised machine learning tool for cluster discovery. This paper explores its application towards network segmentation, a type of cluster discovery. Very commonly we encounter networks of extremely large size, making computational complexity a major issue in the algorithmic development. In this regard, the sparse structure of kernel matrices can be an invaluable asset. For partitioning nonvectorial data such as network graphs, we need to use a vector-free clustering criterion for K-means. The kernel matrix constructed from nonvectorial data usually needs to be preprocessed to assure Mercer's condition, which is required to guarantee monotonic convergence of the kernel K-means algorithm. This paper describes a centroid-free criterion for K-means so that it can be applied to nonvectorial data such as network segmentation. Such a criterion leads to our introduction of pattern-centroid similarity which ultimately leads to a kernel trick algorithm based on updating of the pattern-centroid similarity. Furthermore, by adopting a recursive updating scheme, the recursive kernel-trick allows a computational saving from O(N2/K) to O(N). For networks with high sparsity structure, the amount of computation required can be further reduced from O(N) to O(Ns), where Ns is the average number of nonzero elements per column in the kernel matrix. Copyright © 2011 John Wiley & Sons, Ltd.