Learning topic models -- provably and efficiently
Author(s): Arora, Sanjeev; Ge, Rong; Halpern, Yoni; Mimno, David; Moitra, Ankur; et al
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1rn97
Abstract: | Today, we have both the blessing and the curse of being over- loaded with information. Never before has text been more important to how we communicate, or more easily avail- able. But massive text streams far outstrip anyone’s ability to read. We need automated tools that can help make sense of their thematic structure, and find strands of meaning that connect documents, all without human supervision. Such methods can also help us organize and navigate large text corpora. Popular tools for this task range from Latent Semantic Analysis (LSA)8 which uses standard linear algebra, to deep learning which relies on non-convex optimization. This paper concerns topic modeling which posits a simple probabilistic model of how a document is generated. We give a formal description of the generative model at the end of the section, but next we will outline its important features. |
Publication Date: | 2018 |
Citation: | Arora, Sanjeev, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. "Learning topic models--provably and efficiently." Communications of the ACM 61, no. 4 (2018): pp. 85-93. doi:10.1145/3186262 |
DOI: | 10.1145/3186262 |
ISSN: | 0001-0782 |
Pages: | 85 - 93 |
Type of Material: | Journal Article |
Journal/Proceeding Title: | Communications of the ACM |
Version: | Final published version. This is an open access article. |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.