Skip to main content

Calibration, Entropy Rates, and Memory in Language Models

Author(s): Braverman, Mark; Chen, Xinyi; Kakade, Sham; Narasimhan, Karthik; Zhang, Cyril; et al

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1f859
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBraverman, Mark-
dc.contributor.authorChen, Xinyi-
dc.contributor.authorKakade, Sham-
dc.contributor.authorNarasimhan, Karthik-
dc.contributor.authorZhang, Cyril-
dc.contributor.authorZhang, Yi-
dc.date.accessioned2021-10-08T19:51:00Z-
dc.date.available2021-10-08T19:51:00Z-
dc.date.issued2020en_US
dc.identifier.citationBraverman, Mark, Xinyi Chen, Sham Kakade, Karthik Narasimhan, Cyril Zhang, and Yi Zhang. "Calibration, Entropy Rates, and Memory in Language Models." In Proceedings of the 37th International Conference on Machine Learning (2020): pp. 1089-1099.en_US
dc.identifier.urihttp://proceedings.mlr.press/v119/braverman20a/braverman20a.pdf-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1f859-
dc.description.abstractBuilding accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are miscalibrated: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.en_US
dc.format.extent1089 - 1099en_US
dc.language.isoen_USen_US
dc.relation.ispartofProceedings of the 37th International Conference on Machine Learningen_US
dc.rightsFinal published version. Article is made available in OAR by the publisher's permission or policy.en_US
dc.titleCalibration, Entropy Rates, and Memory in Language Modelsen_US
dc.typeConference Articleen_US
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceedingen_US

Files in This Item:
File Description SizeFormat 
CalibrationEntropyRates.pdf674.74 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.