Have a personal or library account? Click to login
A Large-Scale Dataset of Annotated Cuneiform Sign Images for Digital Palaeography Cover

A Large-Scale Dataset of Annotated Cuneiform Sign Images for Digital Palaeography

Open Access
|Mar 2026

Abstract

This paper presents a large-scale dataset of 158,946 annotated cuneiform sign crops extracted from 9,276 clay tablets and other objects spanning over three millennia of Mesopotamian history (ca. 2800 BCE–75 CE). The dataset was created through manual annotation on the Electronic Babylonian Library (eBL) platform and semi-automated extraction methods, combining high-resolution photographs from major collections with detailed metadata including sign names, transliteration values, and historical periods. The data is stored in JPEG format for images and JSON for metadata, and is publicly accessible under a CC BY-NC 4.0 license. This dataset enables digital palaeographic analysis for dating undated tablets, supports machine learning applications in optical character recognition, and facilitates computational studies of scribal practices and regional variation in cuneiform writing.

DOI: https://doi.org/10.5334/johd.503 | Journal eISSN: 2059-481X
Language: English
Submitted on: Dec 19, 2025
|
Accepted on: Feb 20, 2026
|
Published on: Mar 25, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Or Lewenstein, Daniel López, Cyrill Dankwardt, Mays Fadhil Alrawi, Louisa Grill, Brian Mak, Albert Setälä, Fiammetta Gori, Aino Hätinen, Felix Rauchhaus, Zsombor Földi, Enrique Jiménez, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.