
CMUCS02116
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMUCS02116
Using Tarjan's Red Rule for Fast Dependency Tree Construction
Dan Pelleg, Andrew Moore
February 2002
CMUCS02116.ps
CMUCS02116.pdf
Keywords: Machine learning, Bayes' networks, dependency
trees, Hoeffding races, scalable datamining
We focus on the problem of efficient learning of dependency trees. It is
wellknown that given the pairwise mutual information coefficients, a
minimumweight spanning tree algorithm solves this problem exactly and in
polynomial time. However, for large datasets it is the construction of
the correlation matrix that dominates the running time. We have developed
a new spanningtree algorithm which is capable of exploiting partial
knowledge about edge weights. The partial knowledge we maintain is a
probabilistic confidence interval on the coefficients, which we derive by
examining just a small sample of the data. The algorithm is able to flag
the need to shrink an interval, which translates to inspection of more
data for the particular attribute pair. Experimental results show
significant improvement in running time, without loss in accuracy of the
generated trees. Interestingly, our spanningtree algorithm is based
solely on Tarjan's rededge rule, which is generally considered a
guaranteed recipe for bad performance.
10 pages
