Computer Science Department
School of Computer Science, Carnegie Mellon University
Memory Requirements for Balanced Computer Architectures
In this paper, a processing element PE is characterized by its computation bandwidth, I/O bandwidth, and the size of its local memory. In carrying out a computation, a PE is said to be balanced if the computing time equals the I/O time. Consider a balanced PE for some computation. Suppose that the computation bandwidth of the PE is increased by a factor of computation the PE will be imbalanced, i.e., it will have to wait for I/O. A standard method to avoid this I/O bottleneck is to reduce the overall I/O requirement of the PE by increasing the size of its local memory. This paper addresses the question of by how much the PE's local memory must be enlarged in order to restore balance.
The following results are shown: For matrix computations such as matrix multiplication and Gaussian elimination, the size of the local memory must relaxation on a k -dimensional grid, the local memory must be enlarged by a factor of a @+ k . For some other computations such as the FFT and sorting, the increase is exponential, i.e., the size of the new these results indicate that to design a balanced PE, the size of its local memory must be increased much more rapidly than its computation bandwidth. This phenomenon seems to be common for many computations where an output may depend on a large subset of the inputs.
Implications of these results for some parallel computer architectures are discussed. One particular result is that to balance an array of p linearly connected PEs for performing matrix computations such as matrix multiplication and matrix triangularization, the size of each PE's local memory must grow linearly with p . Thus, the larger the array is, the larger each PE's local memory must be.