Skip to main content

On Exact Computation with an Infinitely Wide Neural Net

Author(s): Arora, Sanjeev; Du, Simon S; Hu, Wei; Li, Zhiyuan; Salakhutdinov, Russ R; et al

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr14z81
Full metadata record
DC FieldValueLanguage
dc.contributor.authorArora, Sanjeev-
dc.contributor.authorDu, Simon S-
dc.contributor.authorHu, Wei-
dc.contributor.authorLi, Zhiyuan-
dc.contributor.authorSalakhutdinov, Russ R-
dc.contributor.authorWang, Ruosong-
dc.date.accessioned2021-10-08T19:50:49Z-
dc.date.available2021-10-08T19:50:49Z-
dc.date.issued2019en_US
dc.identifier.citationArora, Sanjeev, Simon S. Du, Wei Hu, Zhiyuan Li, Russ R. Salakhutdinov, and Ruosong Wang. "On Exact Computation with an Infinitely Wide Neural Net." Advances in Neural Information Processing Systems 32 (2019).en_US
dc.identifier.issn1049-5258-
dc.identifier.urihttps://papers.nips.cc/paper/2019/file/dbc4d84bfcfe2284ba11beffb853a8c4-Paper.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr14z81-
dc.description.abstractHow well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its “width”— namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers — is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width. The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for performance of a pure kernel-based method on CIFAR-10, being 10% higher than the methods reported in [Novak et al., 2019], and only 6% lower than the performance of the corresponding finite deep net architecture (once batch normalization etc. are turned off). Theoretically, we also give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK.en_US
dc.language.isoen_USen_US
dc.relation.ispartofAdvances in Neural Information Processing Systemsen_US
dc.rightsFinal published version. Article is made available in OAR by the publisher's permission or policy.en_US
dc.titleOn Exact Computation with an Infinitely Wide Neural Neten_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
ExactComputation.pdf685.83 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.