Learning to Control in Metric Space with Optimal Regret

Yang, Lin F.; Ni, Chengzhuo; Wang, Mengdi

Learning to Control in Metric Space with Optimal Regret

Author(s): Yang, Lin F.; Ni, Chengzhuo; Wang, Mengdi

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1nb6j

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, Lin F.	-
dc.contributor.author	Ni, Chengzhuo	-
dc.contributor.author	Wang, Mengdi	-
dc.date.accessioned	2020-02-24T22:23:45Z	-
dc.date.available	2020-02-24T22:23:45Z	-
dc.date.issued	2019-09-01	en_US
dc.identifier.citation	Yang, LF, Ni, C, Wang, M. (2019). Learning to Control in Metric Space with Optimal Regret. 2019 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019, 726 - 733. doi:10.1109/ALLERTON.2019.8919864	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1nb6j	-
dc.description.abstract	We study online reinforcement learning for finite-horizon deterministic control systems with arbitrary state and action spaces. Suppose the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after K episodes is o(DLK)^{\frac{d}{d+1}}H where D is the diameter of the state-action space, L is a smoothness parameter, and d is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret. © 2019 IEEE.	en_US
dc.format.extent	726 - 733	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019	en_US
dc.rights	Author's manuscript	en_US
dc.title	Learning to Control in Metric Space with Optimal Regret	en_US
dc.type	Journal Article	en_US
dc.identifier.doi	doi:10.1109/ALLERTON.2019.8919864	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
OA_LearningControlMetricSpaceOptimalRegret.pdf		214.38 kB	Adobe PDF	View/Download

Show Simple Item Record