Finite-time analysis for the knowledge-gradient policy
Author(s): Wang, Y; Powell, William B
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr10c4j
Abstract: | © 2018 Society for Industrial and Applied Mathematics. We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective on interpreting Bayesian ranking and selection problems as adaptive stochastic multiset maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In addition, we introduce the concept of prior-optimality and provide another insight into the performance of the knowledge-gradient policy based on the submodular assumption on the value of information. We demonstrate submodularity for the two-alternative case and provide other conditions for more general problems, bringing out the issue and importance of submodularity in learning problems. Empirical experiments are conducted to further illustrate the finite-time behavior of the knowledge-gradient policy. |
Publication Date: | 1-Jan-2018 |
Citation: | Wang, Y, Powell, WB. (2018). Finite-time analysis for the knowledge-gradient policy. SIAM Journal on Control and Optimization, 56 (2), 1105 - 1129. doi:10.1137/16M1073388 |
DOI: | doi:10.1137/16M1073388 |
ISSN: | 0363-0129 |
Pages: | 1105 - 1129 |
Type of Material: | Journal Article |
Journal/Proceeding Title: | SIAM Journal on Control and Optimization |
Version: | Author's manuscript |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.