|
CMU-ISRI-04-115
Institute for Software Research International
School of Computer Science, Carnegie Mellon University
CMU-ISRI-04-115
How (Not) to Protect Genomic Data Privacy in a Distributed Network:
Using Trail Re-identification to Evaluate and Design Privacy Protection
Systems
Bradley Malin, Latanya Sweeney
May 2004
CMU-ISRI-04-115.ps
CMU-ISRI-04-115.pdf
Keywords: Privacy, anonymity, re-identification, genomics, DNA
Databases
The increasing integration of patient-specific genomic data
into clinical practice and research raises serious privacy
concerns. Various systems have been proposed that protect
privacy by removing or encrypting explicitly identifying
information, such as name or social security number, into
pseudonyms. Though these systems claim to protect identity
from being disclosed, they lack formal proofs. In this paper,
we study the erosion of privacy when genomic data, either
pseudonymous or data believed to be anonymous, is released
into a distributed healthcare environment. Several algorithms
are introduced, collectively called RE-Identification of Data
In Trails (REIDIT), which link genomic data to named individuals
in publicly available records by leveraging unique features in
patient-location visit patterns. Algorithmic proofs of
re-identification are developed and we demonstrate, with
experiments on real-world data, that susceptibility to
re-identification is neither trivial nor the result of
bizarre isolated occurrences. We propose that such
techniques can be applied as system tests of privacy
protection capabilities.
17 pages
|