Computer Science Department
School of Computer Science, Carnegie Mellon University


Exploration and Policy Reuse

Fernando Fernández, Manuela Veloso

July 2005


Keywords: Reinforcement Learning, Policy Reuse, Exploration Strategies

We define Policy Reuse as a learning technique guided by past policies offering the challenge of balancing among three choices: exploitation of the ongoing learned policy, exploration of random actions, and exploration towards the past policies. In this work we introduce a new exploration strategy, π-reuse, as an intelligent bias to reuse a past policy when learning a new one. Interestingly, this strategy also provides a similarity metric among a set of past policies and the new one. We therefore define a π-reuse based similarity metric between policies. We introduce a new algorithm that combines the selection and reuse of past policies using this similarity metric. We show empirical results that demonstrate the usefulness of our exploration strategy, π-reuse, as an intelligent bias to reuse past policies, and also, its effectiveness in defining similarity between policies.

16 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by