To refer to this page use:
|Abstract:||© 2018 Society for Industrial and Applied Mathematics. We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective on interpreting Bayesian ranking and selection problems as adaptive stochastic multiset maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In addition, we introduce the concept of prior-optimality and provide another insight into the performance of the knowledge-gradient policy based on the submodular assumption on the value of information. We demonstrate submodularity for the two-alternative case and provide other conditions for more general problems, bringing out the issue and importance of submodularity in learning problems. Empirical experiments are conducted to further illustrate the finite-time behavior of the knowledge-gradient policy.|
|Citation:||Wang, Y, Powell, WB. (2018). Finite-time analysis for the knowledge-gradient policy. SIAM Journal on Control and Optimization, 56 (2), 1105 - 1129. doi:10.1137/16M1073388|
|Pages:||1105 - 1129|
|Type of Material:||Journal Article|
|Journal/Proceeding Title:||SIAM Journal on Control and Optimization|
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.