Skip to main content

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Author(s): Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara

To refer to this page use:
Abstract: We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Polya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Polya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.
Publication Date: 2018
Citation: Dumitrascu, Bianca, Karen Feng, and Barbara E. Engelhardt. "PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits." In Advances in Neural Information Processing Systems 31 (2018).
ISSN: 1049-5258
Type of Material: Conference Article
Journal/Proceeding Title: Advances in Neural Information Processing Systems
Version: Final published version. Article is made available in OAR by the publisher's permission or policy.

Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.