Skip to main content

A partially linear framework for massive heterogeneous data

Author(s): Zhao, Tianqi; Cheng, Guang; Liu, Han

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1f79k
Full metadata record
DC FieldValueLanguage
dc.contributor.authorZhao, Tianqi-
dc.contributor.authorCheng, Guang-
dc.contributor.authorLiu, Han-
dc.date.accessioned2020-04-13T22:05:57Z-
dc.date.available2020-04-13T22:05:57Z-
dc.date.issued2016en_US
dc.identifier.citationZhao, Tianqi, Guang Cheng, and Han Liu. "A partially linear framework for massive heterogeneous data." The Annals of Statistics 44, no. 4 (2016): 1400. doi:10.1214/15-AOS1410en_US
dc.identifier.issn0090-5364-
dc.identifier.urihttps://arxiv.org/abs/1410.8570-
dc.identifier.urihttps://projecteuclid.org/euclid.aos/1467894703-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1f79k-
dc.description.abstractWe consider a partially linear framework for modeling massive heterogeneous data. The major goal is to extract common features across all subpopulations while exploring heterogeneity of each subpopulation. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (nonasymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracle result holds when the number of subpopulations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of subpopulations. All the above results require to regularize each subestimation as though it had the entire sample. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is statistical inferences for general kernel ridge regression. Thorough numerical results are also provided to back up our theory.en_US
dc.format.extent1400 - 1437en_US
dc.language.isoen_USen_US
dc.relation.ispartofThe Annals of Statisticsen_US
dc.rightsAuthor's manuscripten_US
dc.titleA partially linear framework for massive heterogeneous dataen_US
dc.typeJournal Articleen_US
dc.identifier.doidoi:10.1214/15-AOS1410-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/journal-articleen_US

Files in This Item:
File Description SizeFormat 
LinearFrameworkMassiveData.pdf1.15 MBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.