Computer Science Department
School of Computer Science, Carnegie Mellon University
SNC-Meister: Admitting More Tenants
Timothy Zhu, Daniel S. Berger, Mor Harchol-Balter
Meeting tail latency Service Level Objectives (SLOs) in shared cloud networks is known to be an important and challenging problem. The main challenge is determining limits on the multitenancy such that SLOs are met. This requires calculating latency guarantees, which is a difficult problem, especially when tenants exhibit bursty behavior as is common in production environments. Nevertheless, recent papers in the past two years (Silo, QJump, and PriorityMeister) show techniques calculating latency based on a branch of mathematical modeling called Deterministic Network Calculus (DNC). The DNC theory is designed for adversarial worst-case conditions, which is sometimes necessary, but is often overly conservative. Typical tenants do not require strict worstcase guarantees, but are only looking for SLOs at lower percentiles (e.g., 99th, 99.9th). This paper describes SNC-Meister, a new admission control system for tail latency SLOs. SNC-Meister improves upon the state-of-the-art DNC-based systems by using a new theory, Stochastic Network (SNC), which is designed for tail latency percentiles. Focusing on tail latency percentiles, rather than the adversarial worst-case DNC latency, allows SNC-Meister to pack together many more tenants: in experiments with production traces, SNC-Meister supports 75% more tenants than the state-of-the-art. We are the first to bring SNC to practice in a real computer system.