Computer Science Department
School of Computer Science, Carnegie Mellon University
When Message Passing Meets Shared Memory
Thomas R. Stricker
Keywords: Parallel compilers, high performance Fortran, direct deposit
message passing, deposit model, postal-model, rendezvous-model, shared memory,
remote store, remote load, memory system performance, massively parallel
multiprocessors, Cray T3E, Cray T3D, Intel iWarp, Intel Paragon
The winner for the most efficient, i.e., the best data transfer
services with the least amount of hardware support, is neither a pure
coherent shared memory architecture nor a pure, coarse grain message
passing distributed memory architecture. Looking at end-to-end transfer,
the optimum lies in between the two extremes. Fine grain data transfer
mechanisms that rely on noncoherent remote loads and stores in a global
address space are highly useful mechanisms. New models of communication that
separate control and data transfer are required to link the property of
those data transfer mechanisms to the property of parallel programs and
their correctness. My deposit and fetch model will successfully do this.
The evaluation of several implementations of direct deposit indicate
that direct deposit results in a major win (factor of three on a Cray T3D)
for large data transfer with complex communication or memory access patterns
and that the benefit is largely due to a reduction of data copies in the
internals of the communication system.
The search for the optimal performance in message passing systems can be
approached from two ends. First, the performance of a full function messaging
library can be analyzed and the costly operations can be carefully eliminated.
Second, an implementor can start from the most efficient low level primitives
and add functionality until a reasonable programming model is offered.
Personally, I have worked from both ends and arrived in both cases at
direct deposit message passing.