Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Ghosh, Dibya; Rahme, Jad; Kumar, Aviral; Adams, Ryan P.; Levine, Sergey; Zhang, Amy

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Author(s): Ghosh, Dibya; Rahme, Jad; Kumar, Aviral; Adams, Ryan P.; Levine, Sergey; et al

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr11c1tg26

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ghosh, Dibya	-
dc.contributor.author	Rahme, Jad	-
dc.contributor.author	Kumar, Aviral	-
dc.contributor.author	Adams, Ryan P.	-
dc.contributor.author	Levine, Sergey	-
dc.contributor.author	Zhang, Amy	-
dc.date.accessioned	2024-10-05T20:53:57Z	-
dc.date.available	2024-10-05T20:53:57Z	-
dc.date.issued	2021-01-01	en_US
dc.identifier.citation	Ghosh, D, Rahme, J, Kumar, A, Zhang, A, Adams, RP, Levine, S. (2021). Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability. Advances in Neural Information Processing Systems, 31 (25502 - 25515	en_US
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr11c1tg26	-
dc.description.abstract	Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic uncertainty, we show that, perhaps surprisingly, this is not the case in RL. We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability, effectively turning even fully-observed MDPs into POMDPs. Informed by this observation, we recast the problem of generalization in RL as solving the induced partially observed Markov decision process, which we call the epistemic POMDP. We demonstrate the failure modes of algorithms that do not appropriately handle this partial observability, and suggest a simple ensemble-based technique for approximately solving the partially observed problem. Empirically, we demonstrate that our simple algorithm derived from the epistemic POMDP achieves significant gains in generalization over current methods on the Procgen benchmark suite.	en_US
dc.format.extent	25502 - 25515	en_US
dc.relation.ispartof	Advances in Neural Information Processing Systems	en_US
dc.rights	Final published version. This is an open access article.	en_US
dc.title	Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability	en_US
dc.type	Journal Article	en_US
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
NeurIPS-2021-why-generalization-in-rl-is-difficult-epistemic-pomdps-and-implicit-partial-observability-Paper.pdf		1.17 MB	Adobe PDF	View/Download

Show Simple Item Record