Abstract
We present a dataset based on the Historical Land Records of Basel, covering the period 1400–1700. The dataset comprises 829 source excerpts in premodern German containing more than 50,000 tokens and 30,000 annotations. The annotations capture nested entities, events, and relations, reflecting complex interactions between actors and properties. Over two-thirds of entity references are nested within others, providing rich material for training and evaluating models in nested sequence tagging, low-resource named entity recognition, and noise-tolerant NLP. The dataset may also support the development of generalized models for premodern German. The dataset is stored on Zenodo, and provided in TEI and XML formats.
