Center for Automated Learning and Discovery
School of Computer Science, Carnegie Mellon University
No-Regret Algorithms for Structured Prediction Problems
Geoffrey J. Gordon
Keywords: Lagrangian Hedging algorithms, online learning,
multiagent learning, online convex programming, structured
prediction problems, extensive-form games, poker, Blackwell
No-regret algorithms are a popular class of online learning rules.
Unfortunately, most no-regret algorithms assume that the set γ
of allowable hypotheses is small and discrete. We consider instead
prediction problems where γ has internal structure: γ
might be the set of strategies in a game like poker, the set of
paths in a graph, or the set of congurations of a data structure
like a rebalancing binary search tree; or γ might be a given
convex set (the "online convex programming" problem) or in general an
arbitrary bounded set. We derive a family of no-regret learning rules,
called Lagrangian Hedging algorithms, to take advantage of this structure.
Our algorithms are a direct generalization of known no-regret learning
rules like weighted majority and external-regret matching. In addition to
proving regret bounds, we demonstrate one
of our algorithms learning to play one-card poker.