Skip to main content

The Inverse Regression Topic Model

Author(s): Rabinovich, Maxim; Blei, David M

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1tn73
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRabinovich, Maxim-
dc.contributor.authorBlei, David M-
dc.date.accessioned2021-10-08T19:44:14Z-
dc.date.available2021-10-08T19:44:14Z-
dc.date.issued2014en_US
dc.identifier.citationRabinovich, Maxim, and David M. Blei. "The Inverse Regression Topic Model." Proceedings of the 31st International Conference on Machine Learnin 32, no. 1: pp. 199-207. 2014.en_US
dc.identifier.issn2640-3498-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1tn73-
dc.description.abstractTaddy (2013) proposed multinomial inverse regression (MNIR) as a new model of annotated text based on the influence of metadata and response variables on the distribution of words in a document. While effective, MNIR has no way to exploit structure in the corpus to improve its predictions or facilitate exploratory data analysis. On the other hand, traditional probabilistic topic models (like latent Dirichlet allocation) capture natural heterogeneity in a collection but do not account for external variables. In this paper, we introduce the inverse regression topic model (IRTM), a mixed-membership extension of MNIR that combines the strengths of both methodologies. We present two inference algorithms for the IRTM: an efficient batch estimation algorithm and an online variant, which is suitable for large corpora. We apply these methods to a corpus of 73K Congressional press releases and another of 150K Yelp reviews, demonstrating that the IRTM outperforms both MNIR and supervised topic models on the prediction task. Further, we give examples showing that the IRTM enables systematic discovery of in-topic lexical variation, which is not possible with previous supervised topic models.en_US
dc.format.extent199 - 207en_US
dc.language.isoen_USen_US
dc.relation.ispartofProceedings of the 31st International Conference on Machine Learningen_US
dc.relation.ispartofseriesProceedings of Machine Learning Research;-
dc.rightsAuthor's manuscripten_US
dc.titleThe Inverse Regression Topic Modelen_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
InverseRegressionTopicModel.pdf453.9 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.