CMU-HCII-22-105
Human-Computer Interaction Institute
School of Computer Science, Carnegie Mellon University



CMU-HCII-22-105

Modular Privacy Flows: A Design Pattern for Data Minimization

Haojian Jin

September 2022

Ph.D. Thesis

CMU-HCII-22-105.pdf


Keywords: Data privacy, data overaccess, design pattern, software architecture, data minimization, human-computer interaction, ubiquitous computing, smart home, smart city, privacy engineering, mobile privacy, software engineering


Computing systems often allow developers to access more data than needed, violating the principle of data minimization - a data controller should limit data collection to only what is necessary to fulfill a specific purpose. For example, most calendar applications only allow users to grant third parties either full access to all their calendar events or none, even though third parties often only need a small portion. Conventional wisdom is to ask data controllers to offer many fine-grained data accesses for every potential use case. However, this would lead to an explosion of accesses that would be onerous for system builders to implement, unwieldy for users to configure, and complex for developers to learn.

This dissertation introduces a new design pattern, called Modular Privacy Flows (MPF), for designing systems that allow developers to collect data on a need-to-know basis. MPF combines three simple ideas. First, instead of enumerating all-or-nothing fine-grained data access, system builders offer a small and fixed set of stateless operators to developers. Second, developers declare intended data access by authoring a Unix-like pipeline using these operators and save the pipeline representation in a text-based manifest. Third, given a manifest, a trusted runtime assembles a data transformation executable using pre-loaded open-source operator implementations, which relays data flows in a structured and enforceable manner.

MPF offers a few important advantages over the conventional all-or-nothing permission approach and other relevant approaches. First, system builders can now support numerous fine-grained APIs by implementing a small set of reusable operator implementations. Second, developers only need to learn the semantics of a few operators to customize their data access. Third, since the operators have clearly defined semantics and the manifests are non-proprietary, MPF can facilitate many independent privacy features to help users manage their privacy in a centralized and unified manner. Further, MPF also allows third-party privacy advocates (e.g., consumer reports) to analyze manifests programmatically and alert users of bad practices.

This dissertation has three main parts. The first part includes three empirical studies to characterize developers' data collection behaviors, illustrating that most developers only need partial or derived data rather than raw data. The second part introduces two MPF software architectures (Peekaboo and MapAggregate) that can reduce developers' data collection and demonstrate the abovementioned advantages. Finally, the third part presents two design methods to help developers navigate data minimization's design space, including data collection decision-making and designing independent privacy features through MPF. Combined, this dissertation will scaffold the future development of data minimization.

268 pages

Thesis Committee:
Jason Hong (Co-chair)
Swarun Kumar (Co-chair, ECE/HCII)
Yuvraj Agarwal (HCII/ISR)
Laura Dabbish
Ben Y. Zhao (University of Chicago)

Jodi Forlizzi, Head, Human-Computer Interaction Institute
Martial Hebert, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu