CMU-CS-99-121
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-99-121

Scheduling Threads for Low Space Requirements
and Good Locality

Girija J. Narlikar

May 1999

This technical report is an extended version of a paper that appears in the Proceedings of the Eleventh ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 1999.

CMU-CS-99-121.ps
CMU-CS-99-121.pdf


Keywords: Multithreading, space efficiency, work stealing, dynamic scheduling, nested parallelism, dynamic dags


The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S1, we show that the expected space requirement is S1 + O(K · p · D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a trade-off between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor.

We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its performance using a set of C-based benchmarks that have dynamic or irregular parallelism. We compare the performance of our scheduler with that of two previous schedulers: the thread library's original scheduler (which uses a FIFO queue), and a provably space-efficient depth-first scheduler. At a fine thread granularity, our scheduler outperforms both these previous schedulers, but requires marginally more memory than the depth-first scheduler.

We also present simulation results on synthetic benchmarks to compare our scheduler with space-efficient versions of both a work-stealing scheduler and a depth-first scheduler. The results indicate that unlike these previous approaches, the new algorithm covers a range of scheduling granularities and space requirements, and allows the user to trade the space requirement of a program with the scheduling granularity.

23 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu