Skip to main content

Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach

Author(s): Fang, EX; Li, MD; Jordan, MI; Liu, H

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1vg49
Full metadata record
DC FieldValueLanguage
dc.contributor.authorFang, EX-
dc.contributor.authorLi, MD-
dc.contributor.authorJordan, MI-
dc.contributor.authorLiu, H-
dc.date.accessioned2021-10-11T14:16:50Z-
dc.date.available2021-10-11T14:16:50Z-
dc.date.issued2017en_US
dc.identifier.citationFang, EX, Li, MD, Jordan, MI, Liu, H. Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach. Journal of the American Statistical Association, 112 (519) (2017): pp. 921 - 932. doi:10.1080/01621459.2016.1256812en_US
dc.identifier.issn0162-1459-
dc.identifier.urihttp://www.personal.psu.edu/xxf13/-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1vg49-
dc.description.abstractCharacterizing the functional relevance of transcription factors (TFs) in different biological contexts is pivotal in systems biology. Given the massive amount of genomic data, computational identification of TFs is emerging as a useful approach to bridge functional genomics with disease risk loci. In this article, we use large-scale gene expression and chromatin immunoprecipitation (ChIP) data corpuses to conduct high-throughput TF-biological context association analysis. This work makes two contributions: (i) From a methodological perspective, we propose a unified topic modeling framework for exploring and analyzing large and complex genomic datasets. Under this framework, we develop new statistical optimization algorithms and semiparametric theoretical analysis, which are also applicable to a variety of large-scale data analyses. (ii) From an experimental perspective, our method generates an informative list of tumor-related TFs and their possible effected tumor types. Our data-driven analysis of 38 TFs in 68 tumor biological contexts identifies functional signatures of epigenetic regulators, such as SUZ12 and SET-DB1, and nuclear receptors, in many tumor types. In particular, the TF signature of SUZ12 is present in a broad range of tumor types, many of which have not been reported before. In summary, our work established a robust method to identify the association between TFs and biological contexts. Given the limited amount of genome-wide binding profiles of TFs and the massive number of expression profiles, our work provides a useful tool to deconvolute the gene regulatory network for tumors and other biological contexts. Supplementary materials for this article are available online.en_US
dc.format.extent921 - 932en_US
dc.language.isoen_USen_US
dc.relation.ispartofJournal of the American Statistical Associationen_US
dc.rightsAuthor's manuscripten_US
dc.titleMining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approachen_US
dc.typeJournal Articleen_US
dc.identifier.doidoi:10.1080/01621459.2016.1256812-
dc.identifier.eissn1537-274X-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/journal-articleen_US

Files in This Item:
File Description SizeFormat 
MineGenomTopicModelling.pdf2.83 MBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.