Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Wang, Mengdi; Fang, Ethan X; Liu, Han

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Author(s): Wang, Mengdi; Fang, Ethan X; Liu, Han

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1zs1h

Abstract:	Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min𝑥𝐄𝑣[𝑓𝑣(𝐄𝑤[𝑔𝑤(𝑥)])]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of 𝑓𝑣,𝑔𝑤 and use an auxiliary variable to track the unknown quantity 𝐄𝑤[𝑔𝑤(𝑥)]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of (𝑘−1/4) in the general case and (𝑘−2/3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of (𝑘−2/7) in the general case and (𝑘−4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.
Publication Date:	2017
Citation:	Wang, Mengdi, Ethan X. Fang, and Han Liu. "Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions." Mathematical Programming 161, no. 1-2 (2017): pp. 419-449. doi:10.1007/s10107-016-1017-3
DOI:	doi:10.1007/s10107-016-1017-3
ISSN:	0025-5610
EISSN:	1436-4646
Pages:	419 - 449
Type of Material:	Journal Article
Journal/Proceeding Title:	Mathematical Programming
Version:	Author's manuscript

Show Full Item Record