Skip to main content

Finite-time analysis for the knowledge-gradient policy

Author(s): Wang, Y; Powell, William B

To refer to this page use:
Abstract: © 2018 Society for Industrial and Applied Mathematics. We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective on interpreting Bayesian ranking and selection problems as adaptive stochastic multiset maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In addition, we introduce the concept of prior-optimality and provide another insight into the performance of the knowledge-gradient policy based on the submodular assumption on the value of information. We demonstrate submodularity for the two-alternative case and provide other conditions for more general problems, bringing out the issue and importance of submodularity in learning problems. Empirical experiments are conducted to further illustrate the finite-time behavior of the knowledge-gradient policy.
Publication Date: 1-Jan-2018
Citation: Wang, Y, Powell, WB. (2018). Finite-time analysis for the knowledge-gradient policy. SIAM Journal on Control and Optimization, 56 (2), 1105 - 1129. doi:10.1137/16M1073388
DOI: doi:10.1137/16M1073388
ISSN: 0363-0129
Pages: 1105 - 1129
Type of Material: Journal Article
Journal/Proceeding Title: SIAM Journal on Control and Optimization
Version: Author's manuscript

Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.