CMU-CS-06-166
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-06-166

Robust Planning in Domains with Stochastic
Outcomes, Adversaries, and Partial Observability

Hugh Brendan McMahan

December 2006

Ph.D. Thesis

CMU-CS-06-166.pdf


Keywords: Planning, Game Theory, Markov Decision Processes, Extensive-form Games, Convex Games, Algorithms

Real-world planning problems often feature multiple sources of uncertainty, including randomness in outcomes, the presence of adversarial agents, and lack of complete knowledge of the world state. This thesis describes algorithms for four related formal models that can address multiple types of uncertainty: Markov decision processes, MDPs with adversarial costs, extensiveform games, and a new class of games that includes both extensive-form games and MDPs as special cases.

Markov decision processes can represent problems where actions have stochastic outcomes. We describe several new algorithms for MDPs, and then show how MDPs can be generalized to model the presence of an adversary that has some control over costs. Extensive-form games can model games with random events and partial observability. In the zero-sum perfect-recall case, a minimax solution can be found in time polynomial in the size of the game tree. However, the game tree must ârememberâ all past actions and random outcomes, and so the size of the game tree grows exponentially in the length of the game. This thesis introduces a new generalization of extensive-form games that relaxes this need to remember all past actions exactly, producing exponentially smaller representations for interesting problems. Further, this formulation unifies extensive-form games with MDP planning.

We present a new class of fast anytime algorithms for the off-line computation of minimax equilibria in both traditional and generalized extensive-form games. Experimental results demonstrate their effectiveness on an adversarial MDP problem and on a large abstracted poker game. We also present a new algorithm for playing repeated extensive-form games that can be used when only the total payoff of the game is observed on each round.

207 pages


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu