Online variational inference for the hierarchical Dirichlet process
Author(s): Wang, C; Paisley, J; Blei, DM
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1fc0c
Abstract: | The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixed-membership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions of terms that reflect recurring patterns (or "topics") in the collection. Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions. One limitation of HDP analysis is that existing posterior inference algorithms require multiple passes through all the data-these algorithms are intractable for very large scale applications. We propose an online variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. Our algorithm is significantly faster than traditional inference algorithms for the HDP, and lets us analyze much larger data sets. We illustrate the approach on two large collections of text, showing improved performance over online LDA, the finite counterpart to the HDP topic model. Copyright 2011 by the authors. |
Publication Date: | 2011 |
Citation: | Wang, C., Paisley, J. & Blei, D.. (2011). Online Variational Inference for the Hierarchical Dirichlet Process. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in PMLR 15:752-760 |
ISSN: | 2640-3498 |
Pages: | 752 - 760 |
Type of Material: | Conference Article |
Journal/Proceeding Title: | Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in PMLR |
Version: | Final published version. This is an open access article. |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.