|
CMU-CS-98-127
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-98-127
Experiments with Parallel Pointer-Based Algorithms
Margaret Reid-Miller
May 1998
Ph.D. Thesis
Unavailable Electronically
Keywords: PRAM, algorithms, list ranking, dictionaries, balanced
trees, treaps, pipelining, set operations, pointer-based implementation
Although parallel pointer-based algorithms have been studied
extensively by the research community, implementations have met with
marginal success, even for the simplest applications. In a careful
implementation study of parallel list ranking (and the related list-scan
operation) and parallel dynamic balanced trees operations, I show that
indeed parallel pointer-based algorithms can have substantial speedup over
fast workstations. List ranking is over 200 times faster on a Cray C90
than on a high-end DEC Alpha workstation, and the balance tree algorithms
are 6.3 to 6.8 times faster on eight processors than on one of a SGI
Power Challenge, and 4.1 to 4.4 times faster on five processors than on
one of a SUN Ultra Enterprise 3000. List ranking is a primitive in many
databases, providing ordered set operations, and index searching. The
parallel algorithms for list ranking and balanced trees are new; the key
to their success is that they are both work optimal and very simple. In
fact, the algorithms for set operations seem simpler than any previous
sequential algorithms with the same work bounds, and might, therefore,
be useful in a sequential context.
The algorithms as implemented, however, do not have optimal depth
(parallel time). This dissertation shows how to reduce the depth of the
tree algorithms to make them optimal by using pipelining. Pipelining has been
used previously, but the method used here allows for asynchronous execution
and for pipeline delays that are data dependent and dynamic. Rather than
making the coding more difficult, the method lets the user write the algorithms
using futures (a parallel language construct) and leaves the management of
the pipelining to an efficient runtime system. To determine the runtime
of algorithms, I first analyze the algorithms in a language-based cost model
in terms of work and depth of the computations, and then show universal
bounds for implementing the language on various machines.
100 pages
|