Skip to main content

The anatomy of efficient FFT and winograd convolutions on modern CPUs

Author(s): Zlateski, Aleksandar; Jia, Zhen; Li, Kai; Durand, Fredo

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr13564
Full metadata record
DC FieldValueLanguage
dc.contributor.authorZlateski, Aleksandar-
dc.contributor.authorJia, Zhen-
dc.contributor.authorLi, Kai-
dc.contributor.authorDurand, Fredo-
dc.date.accessioned2021-10-08T19:50:42Z-
dc.date.available2021-10-08T19:50:42Z-
dc.date.issued2019en_US
dc.identifier.citationZlateski, Aleksandar, Zhen Jia, Kai Li, and Fredo Durand. "The anatomy of efficient FFT and winograd convolutions on modern CPUs." In Proceedings of the ACM International Conference on Supercomputing (2019): pp. 414-424. doi:10.1145/3330345.3330382en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr13564-
dc.description.abstractWinograd-based convolution has quickly gained traction as a preferred approach to implement convolutional neural networks (ConvNet) on various hardware platforms because it could require fewer floating point operations than FFT-based or direct convolutions. In this paper, we analyze the theoretical performances of three methods (regular FFT-, Gauss-FFT-, and Winograd-based convolutions), as well as compare their highly optimized implementations on modern multi- and many-core CPUs. With all three implementations employing the same optimizations on modern CPUs, our experimental results with modern ConvNets show that the FFT-based implementations generally outperform the Winograd-based approach, which is contrary to the popular belief. To understand the results, we use a Roofline performance model to analyze the three implementations in detail, by looking at each of their computation phases and by considering not only the number of floating point operations, but also the memory bandwidth and the cache sizes. The performance analysis explains why, and under what conditions, the FFT-based implementations outperform the Winograd-based one, on modern CPUs.en_US
dc.format.extent414 - 424en_US
dc.language.isoen_USen_US
dc.relation.ispartofProceedings of the ACM International Conference on Supercomputingen_US
dc.rightsFinal published version. This is an open access article.en_US
dc.titleThe anatomy of efficient FFT and winograd convolutions on modern CPUsen_US
dc.typeConference Articleen_US
dc.identifier.doi10.1145/3330345.3330382-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
AnatomyEfficient.pdf229.16 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.