Skip to main content

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Author(s): Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1t838
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDumitrascu, Bianca-
dc.contributor.authorFeng, Karen-
dc.contributor.authorEngelhardt, Barbara-
dc.date.accessioned2021-10-08T19:49:45Z-
dc.date.available2021-10-08T19:49:45Z-
dc.date.issued2018en_US
dc.identifier.citationDumitrascu, Bianca, Karen Feng, and Barbara E. Engelhardt. "PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits." In Advances in Neural Information Processing Systems 31 (2018).en_US
dc.identifier.issn1049-5258-
dc.identifier.urihttps://papers.neurips.cc/paper/2018/file/ce6c92303f38d297e263c7180f03d402-Paper.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1t838-
dc.description.abstractWe address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Polya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Polya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.en_US
dc.language.isoen_USen_US
dc.relation.ispartofAdvances in Neural Information Processing Systemsen_US
dc.rightsFinal published version. Article is made available in OAR by the publisher's permission or policy.en_US
dc.titlePG-TS: Improved Thompson Sampling for Logistic Contextual Banditsen_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
ContextualBandits.pdf1.9 MBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.