Computer Science Department
School of Computer Science, Carnegie Mellon University


A Parallel, Multithreaded Decision Tree Builder

Girija J. Narlikar

December 1998

Keywords: Decision tree building, multithreading, parallel C4.5, scalable data mining, lightweight Pthreads, space efficiency, dynamic scheduling

Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficent Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.

14 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by