Have a personal or library account? Click to login
ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch Cover

ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch

Open Access
|Jul 2024

Abstract

We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a desideratum in the scholarly community until now. All models presented were trained on publicly available data using the open-source kraken engine. Our endeavor focuses on the digitization of a large-scale collection of local police reports (1876–1945). Additionally, we include a supermodel trained on the union of other Dutch-language datasets (extending back to the 17th century) which we hope will be useful as a foundational model for future projects. Our results demonstrate performance that is competitive with proprietary software solutions.

DOI: https://doi.org/10.5334/johd.225 | Journal eISSN: 2059-481X
Language: English
Submitted on: May 21, 2024
Accepted on: Jun 19, 2024
Published on: Jul 11, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Lith Lefranc, Ilja Van Damme, Thibault Clérice, Mike Kestemont, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.