|   | CMU-CS-98-184 Computer Science Department
 School of Computer Science, Carnegie Mellon University
 
    
     
 CMU-CS-98-184
 
A Parallel, Multithreaded Decision Tree Builder 
Girija J. Narlikar 
December 1998  
CMU-CS-98-184.psCMU-CS-98-184.pdf
 Keywords: Decision tree building, multithreading, parallel C4.5,
scalable data mining, lightweight Pthreads, space efficiency, dynamic
scheduling
 Parallelization has become a popular mechanism to speed up data
classification tasks that deal with large amounts of data.  This paper
describes a high-level, fine-grained parallel formulation of a
decision tree-based classifier for memory-resident datasets on SMPs.
We exploit two levels of divide-and-conquer parallelism in the tree
builder: at the outer level across the tree nodes, and at the inner
level within each tree node. Lightweight Pthreads are used to express
this highly irregular and dynamic parallelism in a natural manner. The 
task of scheduling the threads and balancing the load is left to a 
space-efficent Pthreads scheduler.  Experimental results on large 
datasets indicate that the space and time performance of the tree 
builder scales well with both the data size and number of processors.
 
14 pages 
 |