# ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs

## Author(s): Zlateski, A; Lee, K; Seung, H. Sebastian

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1m24b
 Abstract: Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation and object detection and localization. Here we consider the parallelization of inference, i.e., the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as the number of output voxels computed per unit time. We propose CPU and GPU primitives for convolutional and pooling layers, which are combined to create CPU, GPU, and CPU-GPU inference algorithms. The primitives include convolution based on highly efficient padded and pruned FFTs. Our theoretical analyses and empirical tests reveal a number of interesting findings. For example, adding host RAM can be a more efficient way of increasing throughput than adding another GPU or more CPUs. Furthermore, our CPU-GPU algorithm can achieve greater throughput than the sum of CPU-only and GPU-only throughputs. Publication Date: 16-Mar-2017 Electronic Publication Date: 16-Mar-2017 Citation: Zlateski, A, Lee, K, Seung, HS. (2017). ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs. 854 - 865. doi:10.1109/SC.2016.72 DOI: doi:10.1109/SC.2016.72 Pages: 854 - 865 Type of Material: Conference Article Journal/Proceeding Title: International Conference for High Performance Computing, Networking, Storage and Analysis, SC Version: Author's manuscript