Computer Science Department
School of Computer Science, Carnegie Mellon University


H. Brendan McMahan, Geoffrey J. Gordon

May 2005

Keywords: Planning under uncertainty, Markov Decision Processes, Prioritized Sweeping, Dijsktra s Algorithm, Gaussian Elimination, Value Iteration, Improved Prioritized Sweeping (IPS), Prioritized Policy Iteration (PPI), Gauss-Dijsktra Elimination (GDE)

We study the problem of computing the optimal value function for a Markov decision process with positive costs. Computing this function quickly and accurately is a basic step in many schemes for deciding how to act in stochastic environments. There are efficient algorithms which compute value functions for special types of MDPs: for deterministic MDPs with S states and A actions, Dijkstra's algorithm runs in time O(AS log S). And, in single-action MDPs (Markov chains), standard linear-algebraic algorithms find the value function in time O(S3), or faster by taking advantage of sparsity or good conditioning. Algorithms for solving general MDPs can take much longer: we are not aware of any speed guarantees better than those for comparably-sized linear programs. We present a family of algorithms which reduce to Dijkstra's algorithm when applied to deterministic MDPs, and to standard techniques for solving linear equations when applied to Markov chains. More importantly, we demonstrate experimentally that these algorithms perform well when applied to MDPs which "almost" have the required special structure.

24 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by