A GRAPH-BASED APPROACH FOR SEMISUPERVISED CLUSTERING

Authors


Tetsuya Yoshida, Graduate School of Information Science and Technology, N-14 W-9, Sapporo 060-0814, Japan; e-mail: yoshida@meme.hokudai.ac.jp

Abstract

This paper proposes a graph-based approach for semisupervised clustering based on pairwise relations among instances. In our approach, the entire data set is represented as an edge-weighted graph by mapping each data element (instance) as a vertex and connecting the instances by edges with their similarities. In order to reflect pairwise constraints on the clustering process, the graph is modified by contraction as it is known from general graph theory and the graph Laplacian in spectral graph theory. The graph representation enables us to deal with pairwise constraints as well as pairwise similarities over the same unified representation. By exploiting the constraints as well as similarities among instances, the entire data set is projected onto a subspace via the modified graph, and data clustering is conducted over the projected representation. The proposed approach is evaluated over several real-world data sets. The results are encouraging and show that it is worthwhile to pursue the proposed approach.

Ancillary