Sharp kernel clustering algorithms and their associated Grothendieck inequalities



In the kernel clustering problem we are given a (large) n × n symmetric positive semidefinite matrix A = (aij) with equation image and a (small) k × k symmetric positive semidefinite matrix B = (bij). The goal is to find a partition {S1,…,Sk} of {1,…n} which maximizes equation image. We design a polynomial time approximation algorithm that achieves an approximation ratio of equation image, where R(B) and C(B) are geometric parameters that depend only on the matrix B, defined as follows: if bij = 〈vi,vj〉 is the Gram matrix representation of B for some equation image then R(B) is the minimum radius of a Euclidean ball containing the points {v1,…,vk}. The parameter C(B) is defined as the maximum over all measurable partitions {A1,…,Ak} of equation image of the quantity equation image, where for i∈{1,…,k} the vector equation image is the Gaussian moment of Ai, i.e., equation image. We also show that for every ε > 0, achieving an approximation guarantee of equation image is Unique Games hard. © 2012 Wiley Periodicals, Inc. Random Struct. Alg., 2013