CMU-CS-98-128Computer Science Department School of Computer Science, Carnegie Mellon University
CMU-CS-98-128
Marco Zagha May 1998 Ph.D. Thesis Unavailable Electronically
Keywords: Parallel algorithms, latency hiding, PRACm QUQW, BSP,
Cray C90, Cray J90, vector multipocessor, memory contention,
interleaved memories, memory banks, benchmarks, sorting, merging,
sparse matrices, graph algorithms
Instead, this thesis considers whether it is feasible to map PRAM algorithms onto a class of realistic parallel machines with only a subset of the features of an ideal PRAM. Our goal is to show that general-purpose PRAM algorithms can be implemented efficiently and systematically on a class of high-bandwidth shared-memory parallel machines that
These features are currently provided on vector multiprocessors, and might be provided in future parallel architectures. To model these machines, we extend Valiant's Bulk Synchronous Parallel (BSP) model to include machines that compensate for slow memory banks by having more memory banks than processors. We evaluate this extended model on the Cray C90 and Cray J90 and explore experimentally how various PRAM variants can be emulated in this model, with particular emphasis on the Queued-Read Queued-Write PRAM. The major focus of the thesis is a practical study of techniques for implementing and modeling irregular PRAM algorithms on vector multiprocessors. The thesis describes a set of case studies on fundamental PRAM algorithms, including sorting, merging, sparse matrix multiplication, and graph connectivity. Each case study describes the selection of a practical PRAM algorithm, the process of mapping the algorithm onto the machine model, a performance model for predicting running time, and practical issues and tradeoffs in developing fast, general-purpose implementations for the Cray C90 and Cray J90. 146 pages
| |

Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |