Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions
Author(s): Wang, Mengdi; Fang, Ethan X; Liu, Han
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1zs1h
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, Mengdi | - |
dc.contributor.author | Fang, Ethan X | - |
dc.contributor.author | Liu, Han | - |
dc.date.accessioned | 2021-10-11T14:17:05Z | - |
dc.date.available | 2021-10-11T14:17:05Z | - |
dc.date.issued | 2017 | en_US |
dc.identifier.citation | Wang, Mengdi, Ethan X. Fang, and Han Liu. "Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions." Mathematical Programming 161, no. 1-2 (2017): pp. 419-449. doi:10.1007/s10107-016-1017-3 | en_US |
dc.identifier.issn | 0025-5610 | - |
dc.identifier.uri | https://arxiv.org/abs/1411.3803 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/pr1zs1h | - |
dc.description.abstract | Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem minπ₯ππ£[ππ£(ππ€[ππ€(π₯)])]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of ππ£,ππ€ and use an auxiliary variable to track the unknown quantity ππ€[ππ€(π₯)]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of ξ»(πβ1/4) in the general case and ξ»(πβ2/3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of ξ»(πβ2/7) in the general case and ξ»(πβ4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc. | en_US |
dc.format.extent | 419 - 449 | en_US |
dc.language.iso | en_US | en_US |
dc.relation.ispartof | Mathematical Programming | en_US |
dc.rights | Author's manuscript | en_US |
dc.title | Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions | en_US |
dc.type | Journal Article | en_US |
dc.identifier.doi | doi:10.1007/s10107-016-1017-3 | - |
dc.identifier.eissn | 1436-4646 | - |
pu.type.symplectic | http://www.symplectic.co.uk/publications/atom-terms/1.0/journal-article | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
StochasticCompositionalGradDescent.pdf | 959.95 kB | Adobe PDF | View/Download |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.