Skip to main content

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Author(s): Arora, Sanjeev; Li, Zhiyuan; Lyu, Kaifeng

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1qv7b
Full metadata record
DC FieldValueLanguage
dc.contributor.authorArora, Sanjeev-
dc.contributor.authorLi, Zhiyuan-
dc.contributor.authorLyu, Kaifeng-
dc.date.accessioned2021-10-08T19:50:40Z-
dc.date.available2021-10-08T19:50:40Z-
dc.date.issued2019en_US
dc.identifier.citationArora, Sanjeev, Zhiyuan Li, and Kaifeng Lyu. "Theoretical Analysis of Auto Rate-Tuning by Batch Normalization." In International Conference on Learning Representations (2019).en_US
dc.identifier.urihttps://openreview.net/pdf?id=rkxQ-nA9FX-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1qv7b-
dc.description.abstractBatch Normalization (BN) has become a cornerstone of deep learning across diverse architectures, appearing to help optimization as well as generalization. While the idea makes intuitive sense, theoretical analysis of its effectiveness has been lacking. Here theoretical support is provided for one of its conjectured properties, namely, the ability to allow gradient descent to succeed with less tuning of learning rates. It is shown that even if we fix the learning rate of scale-invariant parameters (e.g., weights of each layer with BN) to a constant (say, 0.3), gradient descent still approaches a stationary point (i.e., a solution where gradient is zero) in the rate of T^{−1/2} in T iterations, asymptotically matching the best bound for gradient descent with well-tuned learning rates. A similar result with convergence rate T^{−1/4} is also shown for stochastic gradient descent.en_US
dc.language.isoen_USen_US
dc.relation.ispartofInternational Conference on Learning Representationsen_US
dc.rightsFinal published version. This is an open access article.en_US
dc.titleTheoretical Analysis of Auto Rate-Tuning by Batch Normalizationen_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
TheoreticalAnalysis.pdf503.69 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.