Institute for Software Research International
School of Computer Science, Carnegie Mellon University


Secure Computation of k-Anonymous Distributed Data

Bradley Malin

May 2004

Data Privacy Laboratory

Keywords: Privacy, confidentiality, security, re-identification, k-anonymity, multiparty computation, quasi-commutative encryption, privacy protocols

In a distributed environment, such as the World Wide Web, an individual leaves behind personal data at many different locations. To protect the privacy of an individual's sensitive information, locations make separate releases of identifiable data (e.g. name or social security number), and sensitive data (e.g. visitor's IP address). To the releasing location the data appears unlinkable, however, links can be established when multiple locations releases are available. This problem, known as trail re-identification, manifests when an individual's location-visit patterns are reconstructed from, and linked between, sensitive and identifiable releases. In this paper, we present a protocol that enables locations to prevent trail re-identification without revealing identified or sensitive data. Instead, locations communicate encrypted versions of their datasets, such that decrypted data is never revealed until completion of the protocol. Via the protocol, every piece of sensitive data, released from any set of locations, is guaranteed to be equally relatable to at least k identities, or is k-anonymous.

16 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by