Have a personal or library account? Click to login
MultiHATHI: A Complete Collection of Multilingual Prose Fiction in the HathiTrust Digital Library Cover

MultiHATHI: A Complete Collection of Multilingual Prose Fiction in the HathiTrust Digital Library

By: Sil Hamilton and  Andrew Piper  
Open Access
|Feb 2023

Figures & Tables

johd-9-95-g1.png
Figure 1

Percentage of books missing a fiction tag for the 20 most frequent languages.

Table 1

List of attributes included in our dataset.

ATTRIBUTEDESCRIPTION
HTIDThe HathiTrust ID by which the work is accessible.
Access RestrictionsWhether the work is made public by HathiTrust.
HathiTrust Bibliography KeyThe respective bibliography key for the work. For retrieving MARC records.
TitleThe title of the volume in question.
Year PublishedThe year in which the work was published.
LanguageThe language in which the work was published.
AuthorThe author of the work in question.
FictionalityWhether the work is intended to be fictional (1) or not (0).
LengthThe length of the work.
johd-9-95-g2.png
Figure 2

Number of books tagged as fiction for the 18 most frequent languages, before and after classification.

johd-9-95-g3.png
Figure 3

Relative number of non-English books by decade before and after classification.

Table 2

List of evaluated languages and their respective precision, recall, and F1 scores.

LANGUAGEPRECISIONRECALLF1
German80%88%84%
Italian100%90%95%
Japanese100%90%95%
Russian90%90%90%
Dutch80%100%88%
Hebrew80%100%88%
Danish100%76%87%
Chinese100%83%91%
Arabic50%100%66%
Polish90%100%94%
DOI: https://doi.org/10.5334/johd.95 | Journal eISSN: 2059-481X
Language: English
Published on: Feb 8, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Sil Hamilton, Andrew Piper, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.