Skip to main content

Online variational inference for the hierarchical Dirichlet process

Author(s): Wang, C; Paisley, J; Blei, DM

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr19v24
Abstract: The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixed-membership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions of terms that reflect recurring patterns (or "topics") in the collection. Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions. One limitation of HDP analysis is that existing posterior inference algorithms require multiple passes through all the data-these algorithms are intractable for very large scale applications. We propose an online variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. Our algorithm is significantly faster than traditional inference algorithms for the HDP, and lets us analyze much larger data sets. We illustrate the approach on two large collections of text, showing improved performance over online LDA, the finite counterpart to the HDP topic model. Copyright 2011 by the authors.
Publication Date: 2011
Citation: Wang, C., Paisley, J. & Blei, D.. (2011). Online Variational Inference for the Hierarchical Dirichlet Process. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in PMLR 15:752-760
ISSN: 1532-4435
EISSN: 1533-7928
Pages: 752 - 760
Type of Material: Conference Article
Journal/Proceeding Title: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in PMLR
Version: Final published version. This is an open access article.



Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.