CMU-CS-05-173
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-05-173

Probabilistic Reuse of Past Policies

Fernando Fernández, Manuela Veloso

July 2005

CMU-CS-05-173.pdf


Keywords: Reinforcement Learning, Policy Reuse, Transfer Learning


A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is similar to the actual policy or not. In this report we describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a similarity metric that estimates how useful is to reuse each of those past policies. This ranking provides a probabilistic bias for the exploration in the new learning process. Several experiments demonstrate that PRQ-Learning finds a balance between exploitation of the ongoing learned policy, exploration of random actions, and exploration toward the past policies.

15 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu