PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits
Author(s): Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1t838
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Dumitrascu, Bianca | - |
dc.contributor.author | Feng, Karen | - |
dc.contributor.author | Engelhardt, Barbara | - |
dc.date.accessioned | 2021-10-08T19:49:45Z | - |
dc.date.available | 2021-10-08T19:49:45Z | - |
dc.date.issued | 2018 | en_US |
dc.identifier.citation | Dumitrascu, Bianca, Karen Feng, and Barbara E. Engelhardt. "PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits." In Advances in Neural Information Processing Systems 31 (2018). | en_US |
dc.identifier.issn | 1049-5258 | - |
dc.identifier.uri | https://papers.neurips.cc/paper/2018/file/ce6c92303f38d297e263c7180f03d402-Paper.pdf | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/pr1t838 | - |
dc.description.abstract | We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Polya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Polya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits. | en_US |
dc.language.iso | en_US | en_US |
dc.relation.ispartof | Advances in Neural Information Processing Systems | en_US |
dc.rights | Final published version. Article is made available in OAR by the publisher's permission or policy. | en_US |
dc.title | PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits | en_US |
dc.type | Conference Article | en_US |
pu.type.symplectic | http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ContextualBandits.pdf | 1.9 MB | Adobe PDF | View/Download |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.