|
CMU-CS-98-184
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-98-184
A Parallel, Multithreaded Decision Tree Builder
Girija J. Narlikar
December 1998
CMU-CS-98-184.ps
CMU-CS-98-184.pdf
Keywords: Decision tree building, multithreading, parallel C4.5,
scalable data mining, lightweight Pthreads, space efficiency, dynamic
scheduling
Parallelization has become a popular mechanism to speed up data
classification tasks that deal with large amounts of data. This paper
describes a high-level, fine-grained parallel formulation of a
decision tree-based classifier for memory-resident datasets on SMPs.
We exploit two levels of divide-and-conquer parallelism in the tree
builder: at the outer level across the tree nodes, and at the inner
level within each tree node. Lightweight Pthreads are used to express
this highly irregular and dynamic parallelism in a natural manner. The
task of scheduling the threads and balancing the load is left to a
space-efficent Pthreads scheduler. Experimental results on large
datasets indicate that the space and time performance of the tree
builder scales well with both the data size and number of processors.
14 pages
|