Have a personal or library account? Click to login
CuneiML: A Cuneiform Dataset for Machine Learning Cover

CuneiML: A Cuneiform Dataset for Machine Learning

Open Access
|Dec 2023

Abstract

The cuneiform writing system holds a vast reservoir of ancient literature, encompassing over 3000 years of history. Originating around the mid-fourth millennium BCE and enduring until the late first millennium BCE, cuneiform writing spans various genres such as administrative, legal, medical, and scientific documents, among others. This article introduces a curated dataset, CuneiML, featuring 38,947 high-resolution 2D photos of Sumerian and Akkadian cuneiform tablets, accompanied by their cuneiform Unicode transcriptions, transliterations, lineart, and metadata. This dataset aims to support the development of machine learning tools for processing and analyzing Sumerian and Akkadian cuneiform artifacts – e.g. for automatically classifying genre, provenance, or period from unannotated tablet images. Thus, CuneiML is designed with consistency of format as a primary concern. Specifically, CuneiML is a result of meticulously preprocessing, segmenting, filtering, and re-transliterating data that is available online in the Cuneiform Digital Library Initiative (CDLI) collection.

DOI: https://doi.org/10.5334/johd.151 | Journal eISSN: 2059-481X
Language: English
Submitted on: Sep 2, 2023
Accepted on: Oct 17, 2023
Published on: Dec 6, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Danlu Chen, Aditi Agarwal, Taylor Berg-Kirkpatrick, Jacobo Myerston, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.