|
CMU-CS-05-172
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-05-172
Exploration and Policy Reuse
Fernando Fernández, Manuela Veloso
July 2005
CMU-CS-05-172.pdf
Keywords: Reinforcement Learning, Policy Reuse, Exploration Strategies
We define Policy Reuse as a learning technique guided by past policies
offering the challenge of balancing among three choices: exploitation
of the ongoing learned policy, exploration of random actions, and
exploration towards the past policies. In this work we introduce a new
exploration strategy, π-reuse, as an intelligent bias to reuse a past
policy when learning a new one. Interestingly, this strategy also
provides a similarity metric among a set of past policies and the new
one. We therefore define a π-reuse based similarity metric between
policies. We introduce a new algorithm that combines the selection
and reuse of past policies using this similarity metric. We show
empirical results that demonstrate the usefulness of our exploration
strategy, π-reuse, as an intelligent bias to reuse past policies,
and also, its effectiveness in defining similarity between policies.
16 pages
|