
CMUCS06110
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMUCS06110
An Asymptotically Optimal Algorithm for the
Max kArmed Bandit Problem
Matthew J. Streeter, Stephen F. Smith
February 2006
CMUCS06110.ps
CMUCS06110.pdf
Keywords: Multiarmed bandit problem, PAC bounds
We present an asymptotically optimal algorithm for the max variant
of the karmed bandit problem. Given a set of k slot
machines, each yielding payoff from a fixed (but unknown) distribution,
we wish to allocate trials to the machines so as to maximize the
expected maximum payoff received over a series of n trials.
Subject to certain distributional assumptions, we show that
O(ln(1⁄δ) ln(n)^{2}⁄∈^{2}) trials
are sufficient to identify, with
probability at least 1 − δ , a machine whose expected
maximum payoff is within ∈ of optimal. This result leads
to a strategy for solving the problem that is asymptotically
optimal in the following sense: the gap between the expected maximum
payoff obtained by using our strategy for n trials and that obtained
by pulling the single best arm for all n trials approaches
zero as n → ∞.
18 pages
