Institute for Software Research
School of Computer Science, Carnegie Mellon University


Automatic Categorization of Privacy Policies:
A Pilot Study

Waleed Ammar*, Shomir Wilson**, Norman Sadeh**, Noah A. Smith*

December 2012


This report also appears as Language Technologies Institute
Technical Report CMU-LTI-12-019.

Keywords: Privacy, Natural Language Processing, Machine Learning

Privacy policies are a nearly ubiquitous feature of websites and online services, and the contents of such policies are legally binding for users. However, the obtuse language and sheer length of most privacy policies tend to discourage users from reading them. We describe a pilot experiment to use automatic text categorization to answer simple categorical questions about privacy policies, as a first step toward developing automated or semi-automated methods to retrieve salient features from these policies. Our results tentatively demonstrate the feasibility of this approach for answering selected questions about privacy policies, suggesting that further work toward user-oriented analysis of these policies could be fruitful.

11 pages

*Language Technologies Institute
**Institute for Software Research

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by