Communication lower bounds for statistical estimation problems via a distributed data processing inequality?
Author(s): Braverman, Mark; Garg, A; Ma, T; Nguyen, Huy L.; Woodruff, DP
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1wd4w
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Braverman, Mark | - |
dc.contributor.author | Garg, A | - |
dc.contributor.author | Ma, T | - |
dc.contributor.author | Nguyen, Huy L. | - |
dc.contributor.author | Woodruff, DP | - |
dc.date.accessioned | 2018-07-20T15:06:18Z | - |
dc.date.available | 2018-07-20T15:06:18Z | - |
dc.date.issued | 2016-06-16 | en_US |
dc.identifier.citation | Braverman, M, Garg, A, Ma, T, Nguyen, HL, Woodruff, DP. (2016). Communication lower bounds for statistical estimation problems via a distributed data processing inequality?. 19-21-June-2016 (1011 - 1020. doi:10.1145/2897518.2897582 | en_US |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/pr1wd4w | - |
dc.description.abstract | We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by message passing and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω(min{n,d}m), where n is the number of observations that each machine receives and d is the ambient dimension. These lower results improve upon Shamir (NIPS'14) and Steinhardt-Duchi (COLT'15) by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems. | en_US |
dc.format.extent | 1011 - 1020 | en_US |
dc.language.iso | en_US | en_US |
dc.relation.ispartof | STOC '16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computing | en_US |
dc.relation.ispartofseries | STOC '16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computing; | - |
dc.rights | Author's manuscript | en_US |
dc.title | Communication lower bounds for statistical estimation problems via a distributed data processing inequality? | en_US |
dc.type | Journal Article | en_US |
dc.identifier.doi | doi:10.1145/2897518.2897582 | - |
dc.date.eissued | 2016-06-19 | en_US |
pu.type.symplectic | http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Communication lower bounds for statistical estimation problems via a distributed data processing inequality.pdf | 362.63 kB | Adobe PDF | View/Download |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.