Skip to main content

Communication lower bounds for statistical estimation problems via a distributed data processing inequality?

Author(s): Braverman, Mark; Garg, A; Ma, T; Nguyen, Huy L.; Woodruff, DP

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1wd4w
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBraverman, Mark-
dc.contributor.authorGarg, A-
dc.contributor.authorMa, T-
dc.contributor.authorNguyen, Huy L.-
dc.contributor.authorWoodruff, DP-
dc.date.accessioned2018-07-20T15:06:18Z-
dc.date.available2018-07-20T15:06:18Z-
dc.date.issued2016-06-16en_US
dc.identifier.citationBraverman, M, Garg, A, Ma, T, Nguyen, HL, Woodruff, DP. (2016). Communication lower bounds for statistical estimation problems via a distributed data processing inequality?. 19-21-June-2016 (1011 - 1020. doi:10.1145/2897518.2897582en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1wd4w-
dc.description.abstractWe study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by message passing and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω(min{n,d}m), where n is the number of observations that each machine receives and d is the ambient dimension. These lower results improve upon Shamir (NIPS'14) and Steinhardt-Duchi (COLT'15) by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.en_US
dc.format.extent1011 - 1020en_US
dc.language.isoen_USen_US
dc.relation.ispartofSTOC '16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computingen_US
dc.relation.ispartofseriesSTOC '16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computing;-
dc.rightsAuthor's manuscripten_US
dc.titleCommunication lower bounds for statistical estimation problems via a distributed data processing inequality?en_US
dc.typeJournal Articleen_US
dc.identifier.doidoi:10.1145/2897518.2897582-
dc.date.eissued2016-06-19en_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
Communication lower bounds for statistical estimation problems via a distributed data processing inequality.pdf362.63 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.