|
CMU-CS-99-122
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-99-122
Learning State Features from Policies
to Bias Exploration in Reinforcement Learning
Bryan Singer, Manuela Veloso
April 1999
CMU-CS-99-122.ps
CMU-CS-99-122.pdf
Keywords: Machine learning, reinforcement learning
When given several problems to solve in some domain, a standard
reinforcement learner learns an optimal policy from scratch for each
problem. If the domain has particular characteristics that are goal
and problem independent, the learner might be able to take advantage
of previously solved problems. Unfortunately, it is generally
infeasible to directly apply a learned policy to new problems. This
paper presents a method to bias exploration through previous problem
solutions, which is shown to speed up learning on new problems. We
first allow a Q-learner to learn the optimal policies for several
problems. We describe each state in terms of local features, assuming
that these state features together with the learned policies can be
used to abstract out the domain characteristics from the specific
layout of states and rewards in a particular problem. We then use a
classifier to learn this abstraction by using training examples
extracted from each learned Q-table. The trained classifier maps state
features to the potentially goal-independent successful actions in the
domain. Given a new problem, we include the output of the classifier
as an exploration bias to improve the rate of convergence of the
reinforcement learner. We have validated our approach empirically. In
this paper, we report results within the complex domain Sokoban which
we introduce.
17 pages
|