CMU-S3D-23-110 Software and Societal Systems Department School of Computer Science, Carnegie Mellon University
Meta-Management of Collections of Autonomic Systems Thomas J. Glazier December 2023
Ph.D. Thesis
To meet the demands of high availability and optimal performance in dynamic environments, modern systems deploy autonomic or self-adaptation mechanisms. However, increasingly today's enterprise systems are compositions of many sub- systems, each an adaptive system. Currently, each autonomic manager operates to maintain locally defined quality-of-service (QoS) objectives, but their independent actions often lead to globally sub-optimal results. Commonly, human admin istrators handle situations in which the collection of autonomic systems is behaving sub-optimally. However, generating a plan to change the configurations of the constituent autonomic managers is a complex and challenging task in the management of a single autonomous system, but the challenge is exacerbated where there may be tradeoffs in how to balance configuration options across the collection of autonomic subsystems. These challenges can be addressed by introducing an automated approach, referred to as meta-management, that provides a formal basis for reasoning about changes to the configurations of autonomic subsystems. The automated approach to meta-management is then established as part of a framework that can be used to instantiate a higher level autonomic manager, referred to as a meta-manager, that provides assurance about, and improves the performance of a collection of autonomic systems. This approach and framework includes a MAPE-K control loop specialized to the needs of meta-management, a domain specific language, SEAM, that enables the practical specification of adaptation policies, and a taxonomy of strategy synthesis techniques. The practicality, effectiveness, and applicability of the approach are then evaluated against three case studies. The first is an AWS Shopping Cart system in which a meta-manager is estab- lished to manage a collection of autonomic system represented by a front end user interface, a middleware services tier, and a database services tier. This case study was selected to evaluate the ability of the meta-manager to improve the homeostatic operations of the collection of autonomic systems on popular architectural pattern, code base, and operations platform that is in wide industrial use.
The second is the Google Control plane in which a meta-manager was established to manage a collection of autonomic systems that suffered a significant outage. This case study was selected because it presented a well documented and specific failure scenario that occurred during the period of the research of this thesis that cause of which was, partially, a result of human-centric management of a collection of autonomic systems.
Finally, the third is a simulation of an electrical grid cascade failure that represents the Northeast Blackout of 2003. This case study was selected because it presents an example of a failure of human-centric management of a collection of
autonomic systems that was exhaustively documented that occurred in a context outside of information technology and/or cloud based providers. This provides credibility to the applicability claim of the thesis.
210 pages
James D. Herbsleb, Head, Software and Societal Systems Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |