Computer Science Department
School of Computer Science, Carnegie Mellon University
The first part of this document describes Pegasus, the internal representation of CASH, and a series of novel program transformations performed by CASH. The most notable of these are a new optimal register-promotion algorithm and partial redundancy elimination for memory accesses based on predicate manipulation.
The second part of this document evaluates the performance of the generated circuits using simulation. Using media processing benchmarks, we show that for the domain of embedded computation, the circuits generated by CASH can sustain high levels of instruction level parallelism, due to the effective use of dataflow software pipelining. A comparison of Spatial Computation and superscalar processors highlights some of the weaknesses of our model of computation, such as the lack of branch prediction and register renaming. Low-level simulation however suggests that the energy efficiency of Application-Specific Hardware is three orders of magnitude better than superscalar processors, one order of magnitude better than low-power digital signal processors and asynchronous processors, and approaching custom hardware chips.
The results presented in this document can be applied in several domains: (1) most of the compiler optimizations are applicable to traditional compilers for high-level languages; (2) CASH itself can be used as a hardware synthesis tool for very fast system-on-a-chip prototyping directly from C sources; (3) the compilation framework we describe can be applied to the translation of imperative languages to dataflow machines; (4) we have extended the dataflow machine model to encompass predication, data-speculation and control-speculation; and (5) the tool-chain described and some specific optimizations, such as lenient execution and pipeline balancing, can be used for synthesis and optimization of asynchronous hardware.