P4P: A Practical Framework for Privacy-Preserving Distributed Computation

Project Overview

Modern networked applications and services, such as e-voting, e-commerce, e-bidding, data mining etc., often call for collaborative computation on shared user data. Privacy concerns have become a major obstacle hindering the acceptance/success of such applications. Existing solutions suffer from prohibitive cost and are not practical. P4P (Peers for Privacy) is a framework for carrying out such computations with adequate performance on today's commodity hardware at reasonable, realistically large scale. P4P leverages the natural incentives on the user side. Most users want better privacy, but don't understand the risks and generally are not willing to pay much for better protection. On the other hand, user communities always contain a fraction of altruistic users who provide resources to the rest of the community - this is how most peer-to-peer computing actually works. Our approach uses resources from a few of these users which allows a very simple, efficient solution that places little or no additional demands on either the server or the other users.

The P4P framework features a unique hybrid architecture combining P2P and existing client-server paradigm, and takes advantage of players' heterogeneity that exists in real world systems. The bulk of the computation and storage are still based on the server, but a (small) subset of users, denoted Privacy Peers, participate in the computation, removing data control from the server. The privacy peers do not gain control of the data, and need not be trusted - they only provide compute cycles. This arrangement allows us to take advantage of the different resources and protections offered by different players. In terms of security and privacy, P4P relies on the server, who is typically protected behind firewalls and maintained by professional administrators, for defending against outside attacks and uses the privacy peers, who are mostly client machines that are maintained by regular users, to protect user privacy against a curious server. Performance-wise, this architecture allows us to use a verifiable secret sharing (VSS) paradigm for computation. The advantage of a VSS private computation scheme is that it works with any sized field. And arithmetic operations with small field elements are extremely efficient, when an element fits in a single memory cell (in contrast, arithmetic operations with large field numbers, as is required by almost all existing MPC protocols and ZKPs, are typically 10^6 times slower). This means private computation in P4P is almost as efficient as the centralized implementation if the computations are composed mostly of additions. In a sense, for these applications, P4P can provide privacy almost for free (in terms of server's computation overhead). Addition-based algorithms are more general than would appear, and include non-linear gradient approaches such as Singular-Value Decomposition and many data mining algorithms based on EM (Expectation Maximization), etc. Thus P4P provides efficient solutions for a large number of real world applications.

The P4P framework also supports a very efficient user data validation protocol that verifies, in zero-knowledge, that the L2 norm of the vector a user inputs into the computation is bounded by a predefined limit L. This prevents a malicious user from exerting too much influence on the computation, which is a serious threat in any realistic application but generally lacks scalable solutions. The protocol uses only a logarithmic number of large field (1024 bits or more) operations. In experiments with our implementation, we show that verification of a million-element vector (e.g. during an SVD calculation) takes a few seconds of server or client time on commodity PCs as shown by the following figure (in contrast, using standard techniques takes hours). Overall, the protocol is dominated by the (linear) time to do small field operations that one has to pay even when the computation is done directly on user data without privacy. This makes privacy protection almost free from the vendor's point of view, which is essential for wide adoption. In addition, P4P can also deal with malicious privacy peers who participate in the computation. However, this is done without resorting to expensive ZKPs or homomorphism. Instead, we introduce a new VSS that takes advantage of the existence of honest majority among the players and relies on consensus for identifying correct behavior. The resulting scheme preserves the feature of "keeping the number of large integer operations small".

Figure 1: (a) Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length.

P4P also provides a set of tools that can be used to compose diverse applications. Such tools include data protection scheme, secure multicast, etc. All together, P4P provides a comprehensive and realistic solution for preserving privacy in real-world applications.

The basic P4P tools are being actively developed. Most components are ready. If you are interested in testing out the code, please send me an email: duan AT cs DOT berkeley DOT edu.




last updated 01/04/2008