Skip to main content

Geometric Exploration for Online Control

Author(s): Plevrakis, Orestis; Hazan, Elad

To refer to this page use:
Abstract: We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs the class of strongly-stable linear policies. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with n 3 √ T -regret, where n is the dimension of the state plus the dimension of control input. The √ T -horizon dependence is optimal, and improves upon the previous best known bound of T 2 / 3 . The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an over-parameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with p o l y ( n ) √ T -regret, building on Stochastic Bandit Convex Optimization.
Publication Date: 2020
Citation: Plevrakis, Orestis, and Elad Hazan. "Geometric Exploration for Online Control." Advances in Neural Information Processing Systems 33 (2020).
ISSN: 1049-5258
Type of Material: Conference Article
Journal/Proceeding Title: Advances in Neural Information Processing Systems
Version: Final published version. Article is made available in OAR by the publisher's permission or policy.

Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.