Bounded regret in stochastic multi-armed bandits
Author(s): Bubeck, Sébastien; Perchet, Vianney; Rigollet, Philippe
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr16j5x
Abstract: | We study the stochastic multi-armed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not possible if one only knows $\mu^{(\star)}$ |
Publication Date: | Feb-2013 |
Citation: | Bubeck, Sébastien, Perchet, Vianney, Rigollet, Philippe. (2013). Bounded regret in stochastic multi-armed bandits. https://arxiv.org/abs/1302.1611v2 |
Pages: | 1 - 14 |
Type of Material: | Journal Article |
Journal/Proceeding Title: | Journal of Machine Learning Research |
Version: | Author's manuscript |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.