Provably Efficient Maximum Entropy Exploration

Hazan, Elad; Kakade, Sham; Singh, Karan; van Soest, Abby

Provably Efficient Maximum Entropy Exploration

Author(s): Hazan, Elad; Kakade, Sham; Singh, Karan; van Soest, Abby

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr10v73

Full metadata record

DC Field	Value	Language
dc.contributor.author	Hazan, Elad	-
dc.contributor.author	Kakade, Sham	-
dc.contributor.author	Singh, Karan	-
dc.contributor.author	van Soest, Abby	-
dc.date.accessioned	2021-10-08T19:49:53Z	-
dc.date.available	2021-10-08T19:49:53Z	-
dc.date.issued	2019	en_US
dc.identifier.citation	Hazan, Elad, Sham Kakade, Karan Singh, and Abby Van Soest. "Provably Efficient Maximum Entropy Exploration." In Proceedings of the 36th International Conference on Machine Learning (2019): pp. 2681-2691.	en_US
dc.identifier.issn	2640-3498	-
dc.identifier.uri	http://proceedings.mlr.press/v97/hazan19a/hazan19a.pdf	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr10v73	-
dc.description.abstract	Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? This work studies a broad class of objectives that are defined solely as functions of the state-visitation frequencies that are induced by how the agent behaves. For example, one natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured in an entropic sense. We provide an efficient algorithm to optimize such such intrinsically defined objectives, when given access to a black box planning oracle (which is robust to function approximation). Furthermore, when restricted to the tabular setting where we have sample based access to the MDP, our proposed algorithm is provably efficient, both in terms of its sample and computational complexities. Key to our algorithmic methodology is utilizing the conditional gradient method (a.k.a. the Frank-Wolfe algorithm) which utilizes an approximate MDP solver.	en_US
dc.format.extent	2681 - 2691	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	Proceedings of the 36th International Conference on Machine Learning	en_US
dc.rights	Final published version. Article is made available in OAR by the publisher's permission or policy.	en_US
dc.title	Provably Efficient Maximum Entropy Exploration	en_US
dc.type	Conference Article	en_US
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
EfficientMaxEntropyExploration.pdf		522.81 kB	Adobe PDF	View/Download

Show Simple Item Record