Institute for Software Research
School of Computer Science, Carnegie Mellon University
Helping Users Understand Privacy Notices with
Kanthashree Mysore Sathyendra, Abhilasha Ravichander,
Also appears as Language Technologies Technical Report
Privacy notices are the default mechanism used to inform users about the data collection and use practices of technologies (e.g., websites, mobile apps, Internet of Things devices) and processes with which they interact. The length of these policies and their often convoluted language have been shown to discourage most users from reading them. Recent progress in natural language processing and machine learning has opened the door to the development of technologies that are capable of automatically extracting statements (or "annotations") from the text of privacy policies. These technologies could help users quickly identify those elements of a privacy notice they care about - without requiring them to read the full text of the notice.
In this article, we review the requirements associated with the development of Query Answering functionality that would enable users to ask questions about specific aspects of privacy notices (e.g. Does this app share my location with third parties? Am I able to review the information this website collects about me? Can I delete my account? For how long is my information going to be retained by this company?). We discuss different possible approaches to supporting such functionality and how they relate to recent advances in automatically annotating privacy notices. Initial results obtained with different machine learning/natural language processing techniques are presented, suggesting that Query Answering functionality could be a particularly promising approach to informing users about privacy practices. In particular, in contrast to automated annotation techniques that aim to extract detailed statements from the text of privacy notices, Query Answering functionality could be configured to return short text fragments extracted from privacy notices and rely on the user (rather than the computer) to interpret some of the finer nuances of the text found in these fragments. Such an approach could potentially prove more robust than fully automated annotation techniques, which at least at this time struggle with the interpretation of finer nuances.
This article also includes a brief discussion of opportunities and challenges associated with possible extensions of Query Answering functionality in the form of privacy assistants capable of entertaining dialogues with users to clarify some of their questions and help them understand to what extent their concerns are explicitly addressed (or not) by the text of privacy notices. Such functionality could provide for yet greater robustness and usability than fully automated annotation techniques, and could eventually also leverage models of what the user already knows and/or cares about.