Have a personal or library account? Click to login
The Basel Land Records Ground Truth: An Annotated Dataset for Information Extraction on German-Language Administrative Records Cover

The Basel Land Records Ground Truth: An Annotated Dataset for Information Extraction on German-Language Administrative Records

Open Access
|Dec 2025

Abstract

We present a dataset based on the Historical Land Records of Basel, covering the period 1400–1700. The dataset comprises 829 source excerpts in premodern German containing more than 50,000 tokens and 30,000 annotations. The annotations capture nested entities, events, and relations, reflecting complex interactions between actors and properties. Over two-thirds of entity references are nested within others, providing rich material for training and evaluating models in nested sequence tagging, low-resource named entity recognition, and noise-tolerant NLP. The dataset may also support the development of generalized models for premodern German. The dataset is stored on Zenodo, and provided in TEI and XML formats.

DOI: https://doi.org/10.5334/johd.387 | Journal eISSN: 2059-481X
Language: English
Submitted on: Sep 1, 2025
|
Accepted on: Nov 3, 2025
|
Published on: Dec 29, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Ismail Prada Ziegler, Benjamin Hitz, Katrin Fuchs, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.