Skip to main content
Have a personal or library account? Click to login
Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python Cover

Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python

Open Access
|Apr 2026

Abstract

Fast-ER is a Python package for GPU-accelerated record linkage and deduplication. These tasks involve computing string similarity metrics for all pairs of values within or between datasets. While these individual calculations are simple, the number of comparisons grows quadratically with dataset size, making record linkage and deduplication prohibitively expensive even for moderately sized datasets. Fast-ER addresses this challenge by harnessing the computational power of CUDA-enabled graphics processing units (GPUs) to accelerate string similarity metrics calculations. Developed in Python, our software library relies heavily on CuPy, an open-source library for array-based numerical computations on GPUs [1]. Fast-ER is readily available on GitHub (https://github.com/jacobmorrier/fast-er) and the Python Package Index (fast-er-link). By enabling fast and efficient linking and deduplication of datasets without consistent identifiers, Fast-ER has wide-ranging applications across fields such as social and health sciences.

DOI: https://doi.org/10.5334/jors.556 | Journal eISSN: 2049-9647
Language: English
Submitted on: Jan 28, 2025
Accepted on: Apr 7, 2026
Published on: Apr 21, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Jacob Morrier, Sulekha Kishore, R. Michael Alvarez, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.