Abstract
We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a desideratum in the scholarly community until now. All models presented were trained on publicly available data using the open-source kraken engine. Our endeavor focuses on the digitization of a large-scale collection of local police reports (1876–1945). Additionally, we include a supermodel trained on the union of other Dutch-language datasets (extending back to the 17th century) which we hope will be useful as a foundational model for future projects. Our results demonstrate performance that is competitive with proprietary software solutions.
