CMU-CS-22-145 Computer Science Department School of Computer Science, Carnegie Mellon University
Programmable, Energy-minimal Computer Architectures Graham Gobieski Ph.D. Thesis August 2022
Ultra-low-power (ULP) sensor devices are increasingly being deployed for a variety of use-cases that require sophisticated processing of sensed data. Regardless of the deployment, energy-efficiency is critical; for battery-powered devices, energy-efficiency determines device lifetime, while for energy-harvesting devices, energy-efficiency determines performance by dictating the frequency of recharging. Unfortunately, existing devices pay a severe energy tax for their programmability, wasting energy in instruction-fetch/decode, pipeline-control and data supply. Further, offloading computation from an edge-device to the cloud is not practical as communication costs an order-of-magnitude more energy than local compute. The solution is to redesign the ULP sensor system stack to increase the energy-efficiency of on-board compute and enable sophisticated processing of sensed data. This thesis proposes such a stack – from software to silicon – that leverages new execution models to reduce the tax of programmability and achieve extreme energy-efficiency. Specifically it contributes 1) SONIC, a software framework that enables machine inference on intermittently-operating, energy-harvesting devices, 2) MANIC, a vector-dataflow co-processor (and corresponding silicon prototype), 3) SNAFU, an ULP coarse-grain-reconfigurable-array (CGRA) generation framework and architecture, and 4) RIPTIDE, a co-designed dataflow compiler and energy-minimal CGRA. SONIC was the first demonstration of machine inference on a commercial, intermittently-operating device, but also exposed the flaws of such devices. MANIC fixed these problems by combining vector execution to amortize instruction fetch with dataflow execution to minimize data supply energy by forwarding intermediate values directly from producers to consumers. SNAFU extended MANIC's vector-dataflow to further reduce energy by minimizing the toggling of shared pipeline resources. Its generated CGRAs implement spatial-vector-dataflow execution that lays out computation across a fabric of PEs, keeping each PE configured in the same way throughout kernel execution. Finally, RIPTIDE improves overall system efficiency by compiling and offloading to its CGRA, programs written in C with complex control-flow and irregular memory accesses. Together these contributions form the basis of a new ULP sensor system stack that is > 2 orders-of-magnitude more efficient than existing systems, enabling new emerging applications that require intelligence "beyond-the-edge."
132 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |