CMU-ISR-13-111 Institute for Software Research School of Computer Science, Carnegie Mellon University
Applying Autnomic Diagnosis at Samsung Electronics Paulo Casnova, Bradley Schmerl, David Garlan, Rui Abreu*, Jungsik Ahn**
September 2013
An increasingly essential aspect of many critical software systems is the ability to quickly diagnose and locate faults so that appropriate corrective measures can be taken. Large, complex software systems fail unpredictably and pinpointing the source of the failure is a challenging task. In this paper we explore how our recently developed technique for automatic diagnosis performs in the automatic detection of failures and fault localization in a critical manufacturing control system of Samsung Electronics, where failures can result in large financial and schedule losses. We show how our approach scales to such systems to diagnose intermittent faults, connectivity problems, protocol violations, and timing failures. We propose a set of measures of accuracy and performance that can be used to evaluate run-time diagnosis. We present lessons learned from this work including how instrumentation limitations may impair diagnostic accuracy: without overcoming these, there is a limit to the kinds of faults that can be detected.
25 pages
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |