COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-98-127
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-98-127

Experiments with Parallel Pointer-Based Algorithms

Margaret Reid-Miller

May 1998

Ph.D. Thesis

Unavailable Electronically

Keywords: PRAM, algorithms, list ranking, dictionaries, balanced trees, treaps, pipelining, set operations, pointer-based implementation

Although parallel pointer-based algorithms have been studied extensively by the research community, implementations have met with marginal success, even for the simplest applications. In a careful implementation study of parallel list ranking (and the related list-scan operation) and parallel dynamic balanced trees operations, I show that indeed parallel pointer-based algorithms can have substantial speedup over fast workstations. List ranking is over 200 times faster on a Cray C90 than on a high-end DEC Alpha workstation, and the balance tree algorithms are 6.3 to 6.8 times faster on eight processors than on one of a SGI Power Challenge, and 4.1 to 4.4 times faster on five processors than on one of a SUN Ultra Enterprise 3000. List ranking is a primitive in many databases, providing ordered set operations, and index searching. The parallel algorithms for list ranking and balanced trees are new; the key to their success is that they are both work optimal and very simple. In fact, the algorithms for set operations seem simpler than any previous sequential algorithms with the same work bounds, and might, therefore, be useful in a sequential context.

The algorithms as implemented, however, do not have optimal depth (parallel time). This dissertation shows how to reduce the depth of the tree algorithms to make them optimal by using pipelining. Pipelining has been used previously, but the method used here allows for asynchronous execution and for pipeline delays that are data dependent and dynamic. Rather than making the coding more difficult, the method lets the user write the algorithms using futures (a parallel language construct) and leaves the management of the pipelining to an efficient runtime system. To determine the runtime of algorithms, I first analyze the algorithms in a language-based cost model in terms of work and depth of the computations, and then show universal bounds for implementing the language on various machines.

100 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu