Institute for Software Research
School of Computer Science, Carnegie Mellon University
Applying Autnomic Diagnosis at Samsung Electronics
Paulo Casnova, Bradley Schmerl, David Garlan, Rui Abreu*, Jungsik Ahn**
An increasingly essential aspect of many critical software systems is the ability to quickly diagnose and locate faults so that appropriate corrective measures can be taken. Large, complex software systems fail unpredictably and pinpointing the source of the failure is a challenging task. In this paper we explore how our recently developed technique for automatic diagnosis performs in the automatic detection of failures and fault localization in a critical manufacturing control system of Samsung Electronics, where failures can result in large financial and schedule losses. We show how our approach scales to such systems to diagnose intermittent faults, connectivity problems, protocol violations, and timing failures. We propose a set of measures of accuracy and performance that can be used to evaluate run-time diagnosis. We present lessons learned from this work including how instrumentation limitations may impair diagnostic accuracy: without overcoming these, there is a limit to the kinds of faults that can be detected.