Have a personal or library account? Click to login
CREMMA Medii Aevi: Literary Manuscript Text Recognition in Latin Cover

CREMMA Medii Aevi: Literary Manuscript Text Recognition in Latin

Open Access
|Apr 2023

Abstract

This paper presents a novel segmentation and handwritten text recognition dataset for Medieval Latin from the 11th to the 16th century. It connects with Medieval French datasets, as well as earlier Latin datasets, by enforcing common guidelines, bringing 263,000 new characters and now totaling over a million characters for medieval manuscripts in both languages. We provide our own addition to Ariane Pinche’s Old French guidelines to deal with specific Latin cases. We also offer an overview of how we addressed this dataset compilation through the use of pre-existing resources. With a higher abbreviation ratio and a better representation of abbreviating marks, we offer new models that outperform the Old French base model on Latin datasets, improving accuracy by 5% on unknown Latin manuscripts.

DOI: https://doi.org/10.5334/johd.97 | Journal eISSN: 2059-481X
Language: English
Published on: Apr 12, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Thibault Clérice, Malamatenia Vlachou-Efstathiou, Alix Chagué, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.