Unsupervised Data Labeling on Graphs by Self‐Assignment Flows

This paper extends the recently introduced assignment flow approach for supervised image labeling to unsupervised scenarios where no labels are given. The resulting self‐assignment flow takes a pairwise data affinity matrix as input data and maximizes the correlation with a low‐rank matrix that is parametrized by the variables of the assignment flow, which entails an assignment of the data to themselves through the formation of latent labels (feature prototypes). A single user parameter, the neighborhood size for the geometric regularization of assignments, drives the entire process. By smooth geodesic interpolation between different normalizations of self‐assignment matrices on the positive definite matrix manifold, a one‐parameter family of self‐assignment flows is defined. Accordingly, our approach can be characterized from different viewpoints, e.g. as performing spatially regularized, rank‐constrained discrete optimal transport, or as computing spatially regularized normalized spectral cuts. Regarding combinatorial optimization, our approach successfully determines completely positive factorizations of self‐assignments in large‐scale scenarios, subject to spatial regularization. Various experiments including the unsupervised learning of patch dictionaries using a locally invariant distance function, illustrate the properties of the approach.


Motivation and Overview
Assignment flows, as introduced in [1], constitute smooth dynamical systems for data labeling on graphs. To this end, the assignment of labels to data is encoded as a point on an elementary statistical manifold. A flow on this manifold evolves the unique unbiased initial assignment towards desired pixel-label decisions. See [2] for additional background, details and related work. In the supervised scenario, data are assumed to lie in a known metric space in which c prototypical points, each associated with a class label in [c] = {1, . . . , c}, are given. However, this constitutes strong prior knowledge which may not be available in applications. Therefore, an adaptation of the assignment flow framework to unsupervised scenarios was introduced in [3] where the feature space has a known Riemannian manifold structure and prototypes are dynamically adapted over the course of the labeling process.
In the present paper, we discuss a different and more general unsupervised assignment flow variant which does not require knowledge of feature space structure other than a metric. Additionally, prototypes are not sought explicitly but emerge from data self-assignment. The latter constitutes a shift in perspective away from assigning labels to data and towards grouping data into classes by deciding for each pair of data if they belong to the same class. The resulting procedure thus performs clustering of the data. Simultaneously, regularization is induced by underlying graph structure without introducing bias through feature augmentation. Many additional aspects are discussed in [4].

Clustering and Self-Assignment Matrices
For a graph with n nodes, matrices W ∈ W = {W ∈ R n×c : W ≥ 0, W 1 c = 1 n } encode soft assignment of labels to nodes. Specifically, each row of W can be seen as a probability distribution over the set [c] of labels. In the context of clustering, the matrix C(W ) := Diag(W 1 n ) = W W carries the cardinalities of c clusters on its diagonal if W is restricted to the subset W c * of integer assignment matrices with full rank. Let K F ∈ R n×n be given with entries (K F ) i,k denoting similarities between vectors f i , f k in a feature space F. The combinatorially hard task of partitioning n data points f i ∈ F into c clusters such that within-cluster variance is minimized and between-cluster variance is maximized can be written as If the integrality constraint in (1) is relaxed, the equality Diag(W 1 n ) = W W no longer holds, leading to the consideration of a family of self-assignment matrices where the normalizing matrix γ s (W ) is chosen on the geodesic between C(W ) and W W in the cone P of symmetric positive definite matrices, i.e. A 0 (W ) = W C(W ) −1 W and A 1 (W ) = W (W W ) −1 W .

of 2
Section 21: Mathematical signal and image processing

Self-Assignment Flows
In [1], data is incorporated into the assignment flow vector field by lifting the gradient of an objective to be maximized onto W. Analogously, a family of self-assignment flows is defined by lifting the gradient of the relaxed clustering objective E s (W ) = K F , A s (W ) and otherwise proceeding without further modification to the supervised scenario. In particular, existing schemes for numerical integration of resulting flows [5] still apply.
Looking at A 0 in the relaxation of (1) clarifies the term self-assignment. By viewing the entries of a matrix W ∈ W as posterior probabilities W i,j = P (j|i) of assigning label j to node i, we obtain the label distribution P (j) = i∈n P (j|i)P (i) = 1 n C(W ) j,j by marginalizing with uniform prior. By invoking Bayes' rule, we find such that the probability of nodes i and k having the same label can be seen as the self-assignment (i ↔ k) probability This makes clear that the objective E s (W ) is to maximize agreement between similarity in F and (self-)assignment to the same label for each pair of nodes. To break symmetry, the unbiased barycenter initialization was slightly perturbed as informed by preliminary k-center clustering. Once the flow has converged to a low-entropy assignment state, latent prototypes can in this case be recovered by computing means in F, yielding a learned patch dictionary of prototypes. Reconstruction of the image by assigned patches illustrates how the majority of image structure is clearly captured in the learned patch dictionary. In large-scale scenarios, the matrix K F may require a large amount of memory such that computational efficiency is greatly reduced by using a low-rank factorization. The analysis in [4] shows that this can lead to very similar results while using less than 1% of otherwise required memory. The whole process is driven by a single parameter -the neighborhood size inducing graph connectivity. In particular, choosing large neighborhoods, i.e. strong assignment regularity, leads to prototypes dying out, highlighting the plug-in-andplay nature of self-assignment flows. We conclude that self-assignment flows provide a flexible mathematical framework for clustering under additional regularization induced by a graph structure underlying the data. This is achieved in a geometrically natural way, without the need for many user parameters and without feature augmentation. A number of additional aspects is discussed in [4].