A major problem in current privacy-preserving data-mining research is the lack of practical mechanisms to deal with malicious users who may submit bogus data to bias the computation. In this paper we explore private computation built on vector addition and its applications in privacy-preserving data mining. We show that such a paradigm not only supports a large number of popular data-mining algorithms (including both linear and nonlinear ones such as SVD, k-means, ID3, machine-learning algorithms based on Expectation-Maximization (EM), etc., and all algorithms in the statistical query model) with efficiency comparable to that of a regular, nonprivate implementation, but also admits extremely efficient zero-knowledge (ZK) protocols for verifying properties of user data. These ZK protocols are based on random projection and use a linear number of inexpensive small field (e.g. 32 or 64 bits) operations, and only a logarithmic number of large-field (1024 bits or more) cryptographic operations, achieving orders of magnitude reduction in running time over standard techniques (from hours to seconds) for large-scale problems. Such ZK tools thus provide practical solutions for handling the malicious users problem for many real-world applications. © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 18-33, 2009
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.