Institute for Software Research International
School of Computer Science, Carnegie Mellon University
Helping Everyday Users Find Anomalies in Data Feeds
Ph.D. Thesis - Software Engineering
Also appears as Computer Science Department
It is particularly difficult to evaluate the dependability of data feeds. The specifications of data feeds are often even sketchier than the specifications of software components, the data feeds may be changed by their proprietors, and everyday users of data feeds only have enough knowledge about the application domain to support their own usage. These factors inhibit many dependability enhancement techniques, which require a model of proper behavior for failure detection, preferably in the form of specifications.
The research presented here addresses this problem by providing CUES, Checking User Expectations about Semantics. CUES is a method and a prototype implementation for making user expectations precise and for checking these precise expectations. CUES treats the precise expectations as a proxy for missing specifications. It checks the precise expectations to detect semantic anomalies---data feed behavior that does not adhere to these expectations. Three case studies and a validation study, all with real-world data, provide evidence of the practicality and usefulness of CUES. The case studies and the validation study indicate that a user of CUES gets substantial benefit for a modest investment of time and effort. In addition to automated detection of anomalies, the benefit often includes a better understanding of the user's own expectations, of the data feeds, and of existing and missing documentation.