Skip to main content

InstaHide: Instance-hiding Schemes for Private Distributed Learning

Author(s): Huang, Yangsibo; Song, Zhao; Li, Kai; Arora, Sanjeev

To refer to this page use:
Abstract: How can multiple distributed entities train a shared deep net on their private data while protecting data privacy? This paper introduces InstaHide, a simple encryption of training images. Encrypted images can be used in standard deep learning pipelines (PyTorch, Federated Learning etc.) with no additional setup or infrastructure. The encryption has a minor effect on test accuracy (unlike differential privacy). Encryption consists of mixing the image with a set of other images (in the sense of Mixup data augmentation technique (Zhang et al., 2018)) followed by applying a random pixel-wise mask on the mixed image. Other contributions of this paper are: (a) Use of large public dataset of images (e.g. ImageNet) for mixing during encryption; this improves security. (b) Experiments demonstrating effectiveness in protecting privacy against known attacks while preserving model accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstration that Mixup alone is insecure as (contrary to recent proposals), by showing some efficient attacks. (e) Release of a challenge dataset to allow design of new attacks.
Publication Date: 2020
Citation: Huang, Yangsibo, Zhao Song, Kai Li, and Sanjeev Arora. "InstaHide: Instance-hiding Schemes for Private Distributed Learning." In Proceedings of the 37th International Conference on Machine Learning (2020): pp. 4507-4518.
Pages: 4507 - 4518
Type of Material: Conference Article
Journal/Proceeding Title: Proceedings of the 37th International Conference on Machine Learning
Version: Final published version. Article is made available in OAR by the publisher's permission or policy.

Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.