Computer Science Department
School of Computer Science, Carnegie Mellon University
More storage workloads do no need the level of performance afforded by a dedicated storage system, but do need the degree of predictability and controllability that comes from one. The benefits of consolidation, such as reduced waste, motivate the move to shared storage: but these benefits can be lost if the storage system is not shared effectively and efficiently among workloads. Unfortunately, inter-workload interference, such as a reduction of locality when multiple request streams are interleaved, can result in dramatic loss of efficiency and performance.
Performance insulation is a system property where each workload sharing the system is assigned a fraction of resources (such as disk time) and receives nearly that fraction of its standalone (dedicated system) performance. Because there is usually some overhead caused by sharing, there could be a drop in efficiency; but a system providing performance insulation provides a bound on efficiency loss at all times, called the R-value. We have built a storage server called Argon to confirm that performance insulation can be achieved in practice for R-values of 0.8-0.9. This means that, running together with other workloads on Argon, workloads lose, at most, only 10-20% of the efficiency they receive on a dedicated system.
When storage systems are built from a cluster of modest servers rather than a single, monolithic server, techniques used to maintain efficiency do not necessarily compose across the servers. The resulting efficiency may actually be lower than the level achieved if no effort were made to preserve efficiency. We identify the causes of this effect and identify the level of coordination among servers needed to avoid this degradation. With appropriate care, efficiency can be maintained on a clustered storage system as well as it can be maintained on a single server.
While performance insulation provides a useful limit on loss of efficiency, many storage workloads also need performance guarantees. It may be significantly more straightforward to express a workload's requirements directly as performance guarantees rather than indirectly as efficiency guarantees. To ensure performance guarantees are consistently met, however, the appropriate allocation of resources needs to be determined and reserved, and later reevaluated if the workload changes in behavior or if the interference between workloads affects their ability to use resources effectively. If the resources assigned to a workload need to be increased to maintain its guarantee, but adequate resources are not available, violations will result.
Though intrinsic workload variability is fundamental, storage systems with the property of performance insulation strictly limit inter-workload interference, another source of variability in resource requirements. Such interference is the major source of "artificial" complexity in maintaining performance guarantees. We design and evaluate a storage system called Cesium that limits interference and thus avoids the class of guarantee violations arising from it. Workloads running on Cesium only suffer from those violations caused by their own variability and not those due to the activities of other workloads. Compared to other quality of service systems proposed in the literature that do not explicitly manage efficiency, realistic and challenging workloads may experience an order of magnitude fewer violations running under Cesium as a result. Performance insulation thus results in more reliable and efficient bandwidth guarantees.