PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Author(s): Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1t838

Full metadata record

DC Field	Value	Language
dc.contributor.author	Dumitrascu, Bianca	-
dc.contributor.author	Feng, Karen	-
dc.contributor.author	Engelhardt, Barbara	-
dc.date.accessioned	2021-10-08T19:49:45Z	-
dc.date.available	2021-10-08T19:49:45Z	-
dc.date.issued	2018	en_US
dc.identifier.citation	Dumitrascu, Bianca, Karen Feng, and Barbara E. Engelhardt. "PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits." In Advances in Neural Information Processing Systems 31 (2018).	en_US
dc.identifier.issn	1049-5258	-
dc.identifier.uri	https://papers.neurips.cc/paper/2018/file/ce6c92303f38d297e263c7180f03d402-Paper.pdf	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1t838	-
dc.description.abstract	We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Polya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Polya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	Advances in Neural Information Processing Systems	en_US
dc.rights	Final published version. Article is made available in OAR by the publisher's permission or policy.	en_US
dc.title	PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits	en_US
dc.type	Conference Article	en_US
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
ContextualBandits.pdf		1.9 MB	Adobe PDF	View/Download

Show Simple Item Record