CMU-CS-19-128
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-19-128

Join-based Parallel Balanced Binary Trees

Yihan Sun

Ph.D. Thesis

November 2019

CMU-CS-19-128.pdf


Keywords: Parallel, balanced binary tree, binary tree, key-value store, map, ordered set, augmented map, index, computational geometry

As one of the most fundamental data structures in algorithm design and programming, trees play an important role in almost every area in computer science. An effective design of trees must consider new challenges imposed by today’s applications. First of all, the large scale of data requires trees to exploit parallelism and theoretical efficiency. Secondly, the comprehensiveness in applications necessitates a wide range of functions for trees. Lastly, each individual application using trees demonstrates its particularity. As a result, some generic frameworks on trees is useful in enabling simplicity and reusability for algorithms and code.

To tackle these challenges, this thesis studies balanced binary trees in the parallel setting. This thesis proposes novel algorithms, frameworks, implementation techniques, and libraries, that are simple and efficient for a variety of large-scale applications. The core of this thesis is an algorithmic framework called join-based algorithms, which bases all tree algorithms and functionalities on a single primitive join. The join function captures the essence for rebalancing, augmentation, and persistence (multi-versioning). Based on join, a variety of simple and efficient tree algorithms work generically across multiple balancing schemes, across multiple abstract augmentations, as well as for both in-place and persistent updates.

Theoretically, this thesis proposes a wide range of parallel tree algorithms. They all have optimal work and poly-logarithmic span. By placing conditions on the join function, we abstract out the preferred properties of a balancing scheme that ensures the optimal cost bounds of join-based algorithms. This also leads to theoretically efficient parallel solutions to some studied applications, such as the geometry data structures and the sweepline paradigm. In practice, the thesis work leads to an implementation of the join-based parallel trees, called P-Trees, in a C++ library called PAM. The library supports a complete interface for sequences, ordered sets, ordered maps, and augmented maps (formally dened in this thesis). Applying the library leads to concise and high-performance implementation of a variety of real-world applications, such as 2D range-based searches and hybrid transactional and analytical processing (HTAP) database management systems.

Experiments show that P-Trees achieve speedups ranging from 40 to 100 across different applications on 72 cores with 2-way hyperthreading. P-Trees’s performance can be up to magnitudes better than existing solutions specific to these applications, in addition to being more concise.

257 pages

Thesis Committee:
Guy E. Blelloch (Chair)
Andrew Pavlo
Daniel D.K. Sleator
Michael T. Goodrich (University of California, Irvine)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu