Learning topic models -- provably and efficiently

Author(s): Arora, Sanjeev; Ge, Rong; Halpern, Yoni; Mimno, David; Moitra, Ankur; et al

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1rn97

Abstract:	Today, we have both the blessing and the curse of being over- loaded with information. Never before has text been more important to how we communicate, or more easily avail- able. But massive text streams far outstrip anyone’s ability to read. We need automated tools that can help make sense of their thematic structure, and find strands of meaning that connect documents, all without human supervision. Such methods can also help us organize and navigate large text corpora. Popular tools for this task range from Latent Semantic Analysis (LSA)8 which uses standard linear algebra, to deep learning which relies on non-convex optimization. This paper concerns topic modeling which posits a simple probabilistic model of how a document is generated. We give a formal description of the generative model at the end of the section, but next we will outline its important features.
Publication Date:	2018
Citation:	Arora, Sanjeev, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. "Learning topic models--provably and efficiently." Communications of the ACM 61, no. 4 (2018): pp. 85-93. doi:10.1145/3186262
DOI:	10.1145/3186262
ISSN:	0001-0782
Pages:	85 - 93
Type of Material:	Journal Article
Journal/Proceeding Title:	Communications of the ACM
Version:	Final published version. This is an open access article.