Get access

How to deal with malicious users in privacy-preserving distributed data mining


  • Part of the work was performed when the first author was a Ph.D. student at Computer Science Division, University of California, Berkeley.


A major problem in current privacy-preserving data-mining research is the lack of practical mechanisms to deal with malicious users who may submit bogus data to bias the computation. In this paper we explore private computation built on vector addition and its applications in privacy-preserving data mining. We show that such a paradigm not only supports a large number of popular data-mining algorithms (including both linear and nonlinear ones such as SVD, k-means, ID3, machine-learning algorithms based on Expectation-Maximization (EM), etc., and all algorithms in the statistical query model) with efficiency comparable to that of a regular, nonprivate implementation, but also admits extremely efficient zero-knowledge (ZK) protocols for verifying properties of user data. These ZK protocols are based on random projection and use a linear number of inexpensive small field (e.g. 32 or 64 bits) operations, and only a logarithmic number of large-field (1024 bits or more) cryptographic operations, achieving orders of magnitude reduction in running time over standard techniques (from hours to seconds) for large-scale problems. Such ZK tools thus provide practical solutions for handling the malicious users problem for many real-world applications. © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 18-33, 2009

Get access to the full text of this article