Computer Science Department
School of Computer Science, Carnegie Mellon University
Probabilistic Reuse of Past Policies
Fernando Fernández, Manuela Veloso
Keywords: Reinforcement Learning, Policy Reuse, Transfer Learning
A past policy provides a bias to guide the exploration of the environment
and speed up the learning of a new action policy. The success of this
bias depends on whether the past policy is similar to the actual policy
or not. In this report we describe a new algorithm, PRQ-Learning,
that reuses a set of past policies to bias the learning of a new one.
The past policies are ranked following a similarity metric that
estimates how useful is to reuse each of those past policies. This
ranking provides a probabilistic bias for the exploration in the new
learning process. Several experiments demonstrate that PRQ-Learning
finds a balance between exploitation of the ongoing learned policy,
exploration of random actions, and exploration toward the past policies.