Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning

Kpotufe, S

Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning

Author(s): Kpotufe, S

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr15c5p

Abstract:	Density-ratio estimation (i.e. estimating f=fQ/fP for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common high-dimensional settings with structured data (e.g. manifold data, sparse data). Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities fQ, fP, supporting the belief that plugging in estimates for fQ, fP is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension. We further show that near-optimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice.
Publication Date:	2017
Citation:	Kpotufe, Samory. "Lipschitz density-ratios, structured data, and data-driven tuning." In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54, pp. 1320-1328. 2017.
ISSN:	2640-3498
Pages:	1320-1328
Type of Material:	Conference Article
Series/Report no.:	Proceedings of Machine Learning Research;
Journal/Proceeding Title:	Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR
Version:	Final published version. Article is made available in OAR by the publisher's permission or policy.

Show Full Item Record