A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
Author(s): Saunshi, Nikunj; Zhang, Yi; Khodak, Mikhail; Arora, Sanjeev
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1mr90
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Saunshi, Nikunj | - |
dc.contributor.author | Zhang, Yi | - |
dc.contributor.author | Khodak, Mikhail | - |
dc.contributor.author | Arora, Sanjeev | - |
dc.date.accessioned | 2021-10-08T19:51:06Z | - |
dc.date.available | 2021-10-08T19:51:06Z | - |
dc.date.issued | 2020 | en_US |
dc.identifier.citation | Saunshi, Nikunj, Yi Zhang, Mikhail Khodak, and Sanjeev Arora. "A Sample Complexity Separation between Non-Convex and Convex Meta-Learning." In Proceedings of the 37th International Conference on Machine Learning (2020): pp. 8512-8521. | en_US |
dc.identifier.issn | 2640-3498 | - |
dc.identifier.uri | http://proceedings.mlr.press/v119/saunshi20a/saunshi20a.pdf | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/pr1mr90 | - |
dc.description.abstract | One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is Ω(𝑑), where 𝑑 is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of 𝑂(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected. | en_US |
dc.format.extent | 8512 - 8521 | en_US |
dc.language.iso | en_US | en_US |
dc.relation.ispartof | Proceedings of the 37th International Conference on Machine Learning | en_US |
dc.rights | Final published version. Article is made available in OAR by the publisher's permission or policy. | en_US |
dc.title | A Sample Complexity Separation between Non-Convex and Convex Meta-Learning | en_US |
dc.type | Conference Article | en_US |
pu.type.symplectic | http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
SampleComplexitySepar.pdf | 742.41 kB | Adobe PDF | View/Download |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.