Skip to main content

Using a probabilistic model to assist merging of large-scale administrative records

Author(s): Enamorado, Ted; Fifield, Benjamin; Imai, Kosuke

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1r180
Full metadata record
DC FieldValueLanguage
dc.contributor.authorEnamorado, Ted-
dc.contributor.authorFifield, Benjamin-
dc.contributor.authorImai, Kosuke-
dc.date.accessioned2020-02-19T21:21:25Z-
dc.date.available2020-02-19T21:21:25Z-
dc.date.issued2019-05en_US
dc.identifier.citationEnamorado, T, Fifield, B, Imai, K. (2019). Using a probabilistic model to assist merging of large-scale administrative records. American Political Science Review, 113 (2), 353 - 371. doi:10.1017/S0003055418000783en_US
dc.identifier.issn0003-0554-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1r180-
dc.description.abstract© 2019 American Political Science Association. Since most social science research relies on multiple data sources, merging data sets is an essential part of researchers' workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical model of probabilistic record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.en_US
dc.format.extent1 - 40en_US
dc.language.isoenen_US
dc.relation.ispartofAmerican Political Science Reviewen_US
dc.rightsAuthor's manuscripten_US
dc.titleUsing a probabilistic model to assist merging of large-scale administrative recordsen_US
dc.typeJournal Articleen_US
dc.identifier.doidoi:10.1017/S0003055418000783-
dc.identifier.eissn1537-5943-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/journal-articleen_US

Files in This Item:
File Description SizeFormat 
[APSR]linkage.pdf490.44 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.