Computer Science Department
School of Computer Science, Carnegie Mellon University


Learning Evaluation Functions for Global Optimization

Justin Andrew Boyan

August 1998

Ph.D. Thesis

Keywords: Machine learning, combinatorial optimization, reinforcement learning, temporal difference learning, evaluation functions, local search, heuristic search, simulated annealing, value function approximation, neuro-dynamic programming, Boolean satisfiability, radiotherapy treatment planning, channel routing, bin-packing, Bayes network learning, production scheduling

In complex sequential decision problems such as scheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The reinforcement-learning methodology of "value function approximation" (VFA) offers an alternative: systems can learn effective decision policies autonomously, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not. This thesis advances the state of the art in VFA in two ways.

First, it presents three new VFA algorithms, which apply to three different restricted classes of sequential decision problems: Grow-Support for deterministric problems, ROUT for acyclic stochastic problems, and Least-Square TD for fixed-policy prediction problems. Each is designed to gain robustness and efficiency over current approaches by exploiting the restricted problem structure to which it applies.

Second, it introduces STAGE, a new search algorithm for general combinatorial optimization tasks. STAGE learns a problem-specific heuristic evaluation function as it searches. The heuristic is trained by supervised linear regression or Least-Squares TF (lambda) to predict, from features of states along the search trajectory, how well a fast local search method such as hillclimbing will perform starting from each state. Search proceeds by alternating between two stages: performing the fast search to gather new training data, and following the learned heuristic to identify a promising new state state.

STAGE has produced good results (in some cases, the best results known) on a variety of combinatorial optimization domains, including VLSI channel routing, Bayes net structure-finding, bin-packing, Boolean satisfiability, radiotherapy treatment planning, and geographic cartogram design. This thesis describes the results in detail, analyzes the reasons for and conditions of STAGE's success, and places STAGE in the context of four decades of research in local search and evaluation function learning. It provides strong evidence that reinforcement learning methods can be efficient and effective on large-scale decision problems.

218 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by