CuneiML: A Cuneiform Dataset for Machine Learning

Danlu Chen; Aditi Agarwal; Taylor Berg-Kirkpatrick; Jacobo Myerston

doi:10.5334/johd.151

Figures & Tables

An overview of CuneiML. An example tablet of ID `453248` with multi-modal data: (1) **Metadata** consist of time period, provenience, genre and measurement. (2) High-resolution **2d photograph** of 6 faces. (3) **Lineart** from paleographers. (4) **Latin transliteration** directly downloaded from CDLI. (5) **Cuneiform Unicode transcription** we automatically converted from the Latin transliteration. (6) **Major face cutouts** automatically processed from the 2d photograph.

Number of tablets by metadata attributes: time period, genre, and provenance.

A random sample of 20 major face cutouts.

Table 1

Task summary with possible input and output pairs. (1) Metadata consist of time period, provenience, genre and measurement. (2) High-resolution 2d photograph of 6 faces. (3) Lineart from paleographers. (4) Latin transliteration (5) Cuneiform Unicode transcription. (6) Major face cutouts.

TASK NAME	INPUT	OUTPUT
Language Modeling	(4)(5)	(4)(5)
Transliteration	(5)	(4)
Lineart generation	(2)(6)	(3)
Attribute prediction	(2)(3)(4)(5)(6)	(1)
Sign identification	(2)(3)(6)	(5)

Table 2

Summary of test accuracy for attribute prediction using different features.

	IMAGE	UNICODE	TRANS.	# OF CLASSES
Time period	97.66	90.50	87.17	14
Provenience	85.72	61.71	68.60	25
Genre	89.00	81.50	86.21	12

CuneiML: A Cuneiform Dataset for Machine Learning

Figures & Tables

Figure 1

Figure 2

Figure 3

Table 1

Table 2

Paradigm

My account