Abstract
This paper presents a large-scale dataset of 158,946 annotated cuneiform sign crops extracted from 9,276 clay tablets and other objects spanning over three millennia of Mesopotamian history (ca. 2800 BCE–75 CE). The dataset was created through manual annotation on the Electronic Babylonian Library (eBL) platform and semi-automated extraction methods, combining high-resolution photographs from major collections with detailed metadata including sign names, transliteration values, and historical periods. The data is stored in JPEG format for images and JSON for metadata, and is publicly accessible under a CC BY-NC 4.0 license. This dataset enables digital palaeographic analysis for dating undated tablets, supports machine learning applications in optical character recognition, and facilitates computational studies of scribal practices and regional variation in cuneiform writing.
