Have a personal or library account? Click to login
A Named Entity-Annotated Corpus of 19th Century Classical Commentaries Cover

A Named Entity-Annotated Corpus of 19th Century Classical Commentaries

Open Access
|Jan 2024

Abstract

We release a multilingual named entity (NE) corpus of 19th century commentaries to Sophocles’ Ajax. Selected commentaries are written in English, German and French, but are also replete with Latin and Greek quotes. Bibliographic entities were annotated along traditional named entities following our guidelines (Romanello & Najem-Meyer, 2022). The corpus contains about 300 annotated pages, 111,216 tokens and 7,334 entity mentions and was featured in the HIPE-2022 shared task. Although named entity recognition (NER) showed reassuring results, optical character recognition (OCR) mistakes and extensive use of abbreviation kept entity linking (EL) a challenging task. With such characteristics, this corpus offers an excellent way to assess the adaptability of information extraction systems to noisy, domain-specific multilingual and multiscript environments.

DOI: https://doi.org/10.5334/johd.150 | Journal eISSN: 2059-481X
Language: English
Submitted on: Sep 1, 2023
Accepted on: Oct 26, 2023
Published on: Jan 2, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Matteo Romanello, Sven Najem-Meyer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.