Contextual Bandit Learning with Predictable Rewards

Agarwal, Alekh; Dudík, Miroslav; Kale, Satyen; Langford, John; Schapire, Robert E

Contextual Bandit Learning with Predictable Rewards

Author(s): Agarwal, Alekh; Dudík, Miroslav; Kale, Satyen; Langford, John; Schapire, Robert E

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1dj87

Full metadata record

DC Field	Value	Language
dc.contributor.author	Agarwal, Alekh	-
dc.contributor.author	Dudík, Miroslav	-
dc.contributor.author	Kale, Satyen	-
dc.contributor.author	Langford, John	-
dc.contributor.author	Schapire, Robert E	-
dc.date.accessioned	2021-10-08T19:47:21Z	-
dc.date.available	2021-10-08T19:47:21Z	-
dc.date.issued	2012	en_US
dc.identifier.citation	Agarwal, Alekh, Dudík, Miroslav, Kale, Satyen, Langford, John, Schapire, Robert E. (Contextual Bandit Learning with Predictable Rewards	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1dj87	-
dc.description.abstract	Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under this assumption, we show three things. We present a new algorithm---Regressor Elimination--- with a regret similar to the agnostic setting (i.e. in the absence of realizability assumption). We prove a new lower bound showing no algorithm can achieve superior performance in the worst case even with the realizability assumption. However, we do show that for any set of policies (mapping contexts to actions), there is a distribution over rewards (given context) such that our new algorithm has constant regret unlike the previous approaches.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	15th International Conference on Artificial Intelligence and Statistics (AISTATS) 2012	en_US
dc.rights	Final published version. This is an open access article.	en_US
dc.title	Contextual Bandit Learning with Predictable Rewards	en_US
dc.type	Conference Article	en_US
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/journal-article	en_US

Files in This Item:

File	Description	Size	Format
ContextualBanditLearningPredictableRewards.pdf		281.53 kB	Adobe PDF	View/Download

Show Simple Item Record