Have a personal or library account? Click to login
An Improved Fellegi-Sunter Framework for Probabilistic Record Linkage Between Large Data Sets Cover

An Improved Fellegi-Sunter Framework for Probabilistic Record Linkage Between Large Data Sets

By: Marco Fortini  
Open Access
|Dec 2020

References

  1. Bishop, Y.M., S.E. Fienberg, and P.W. Holland. 1975. Discrete multivariate analysis. Cambridge, Mass.: MIT Press. DOI: https://doi.org/10.1007/978-0-387-72806-3.10.1007/978-0-387-72806-3
  2. Baxter, R., P. Christen, and T. Churches. 2003. “A Comparison of Fast Blocking Methods for Record Linkage”. CMIS Technical Report 03/139, six-pages version of the paper published in Proceedings of ACM SIGKDD ’03. Available at: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.4563&rep=rep1&type=pdf (accessed April 2020).
  3. Christen, p. 2012. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer Science and Business Media. DOI: https://doi.org/10.1007/978-3-642-31164-2.10.1007/978-3-642-31164-2
  4. Cibella, N., M. Fortini, M. Scannapieco, L. Tosco, and T. Tuoto. 2009. “Theory and practice in developing a record linkage software”. Insights on Data Integration Methodologies:37–56. Available at: https://ec.europa.eu/eurostat/documents/3888793/5845197/KS-RA-09-005-EN.PDF/4cef0f2d-45a0-46b7-bfd6-196a55fca801?version=1.0 (accessed April 2020).
  5. Cormen, T.H., C.E. Leiserson, R.L. Rivest, and C. Stein. 2009. Introduction to algorithms. MIT press. Available at: https://mitpress.mit.edu/books/introduction-algorithms-third-edition (accessed April 2020).
  6. Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm”. Journal of the Royal Statistical Society B 39: 1–38. DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.10.1111/j.2517-6161.1977.tb01600.x
  7. Hernandez, M.A., and S.J. Stolfo. 1995 “The merge/purge problem for large databases”. Edited by M.J. Carey and D.A. Schneider in SIGMOD, 127–138. DOI: https://doi.org/10.1145/568271.223807.10.1145/568271.223807
  8. Herzog, T.N., F.J. Scheuren, and W.E. Winkler. 2007. Data quality and record linkage techniques. Springer Science and Business Media. DOI: https://doi.org/10.1007/0-387-69505-2.10.1007/0-387-69505-2
  9. Fellegi I., and A.B. Sunter. 1969. “A Theory for Record Linkage”, Journal of the American Statistical Association, 64, 328: 1183–1210. DOI: https://doi.org/10.1080/01621459.1969.10501049.10.1080/01621459.1969.10501049
  10. Fortini, M., L. Mancini, L.Marcone, E.Mussino, and E. Paluzzi. 2013. “Who Settles Down in Italy? Transition to Residency of non-EU Migrants”. Rivista Italiana di Economia Demografia e Statistica, no. LXVII, (3/4). Available at: https://www.sieds.it/listing/RePEc/journl/2013LXVII_N34rieds.pdf (accessed April 2020).
  11. Jaro, M.A. 1989. “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida”. Journal of the American Statistical Association, 84: 414–420. DOI: https://doi.org/10.1080/01621459.1989.10478785.10.1080/01621459.1989.10478785
  12. Larsen, M.D., and D.B. Rubin. 2001. “Iterative automated record linkage using mixture models”. Journal of the American Statistical Association, 96(453): 32–41. DOI: https://doi.org/10.1198/016214501750332956.10.1198/016214501750332956
  13. Murray, J. 2015. “Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering”. Journal of Privacy and Confidentiality, 7(1). DOI: https://doi.org/10.29012/jpc.v7i1.643.10.29012/jpc.v7i1.643
  14. Neykov, N., P. Filzmoser, R. Dimova, and P. Neytchev. 2007. “Robust fitting of mixtures using the trimmed likelihood estimator”. Computational Statistics & Data Analysis, 52(1): 299–308. DOI: https://doi.org/10.1016/j.csda.2006.12.024.10.1016/j.csda.2006.12.024
  15. Newcombe, H.B., J.M. Kennedy, S.J. Axford, and A.P. James. 1959. “Automatic linkage of vital records”. Science, 130(3381): 954–959. DOI: https://doi.org/10.1126/science.130.3381.954.10.1126/science.130.3381.95414426783
  16. Thibaudeau Y. 1993. “The discrimination power of dependency structures in record linkage”. Survey Methodology, 19: 31–38. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/1993001/article/14477-eng.pdf (accessed April 2020).
  17. Winkler, W.E. 1988. “Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage”. Proceedings of the Section on Survey Research Methods: American Statistical Association: 667–671. Available at: https://www.asasrms.org/Proceedings/papers/1988_124.pdf (accessed April 2020).
  18. Winkler, W.E. 1989. “Near Automatic Weight Computation in the Fellegi-Sunter Model of Record Linkage”. Proceedings of the Fifth Census Bureau Annual Research Conference, March 19-22, Arlington, Virginia, U.S.A.: 145–155. Available at: https://www.academia.edu/34177520/Near_Automatic_Weight_Computation_in_-the_Fellegi-Sunter_Model_of_Record_Linkage (accessed April 2020).
  19. Winkler, W.E. 2006. “Overview of record linkage and current research directions”. Bureau of the Census Working Paper No. RRS2006-02. Available at: https://www.census.gov/library/working-papers/2006/adrm/rrs2006-02.html (accessed April 2020).
  20. Yancey, W.E. 2002. “Improving EM Algorithm Estimates for Record Linkage Parameters”. Proceedings of the Section on Survey Research Methods: American Statistical Association. Available at https://www.asasrms.org/Proceedings/y2002/Files/JSM2002-000581.pdf (accessed April 2020).
Language: English
Page range: 803 - 825
Submitted on: Jul 1, 2018
|
Accepted on: Jul 1, 2020
|
Published on: Dec 10, 2020
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2020 Marco Fortini, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.