Have a personal or library account? Click to login
A Dataset of American Poetry by Poets from Historically Underrepresented Groups in the HathiTrust Digital Library Cover

A Dataset of American Poetry by Poets from Historically Underrepresented Groups in the HathiTrust Digital Library

By: Gyuri Kang and  Kahyun Choi  
Open Access
|Mar 2026

Abstract

This dataset provides a collection of American poetry by poets from historically underrepresented groups in the HathiTrust Digital Library. It comprises 9,321 poems from 113 collections by 40 African Americans, 22 Asian Americans, 3 Pacific Islanders, 17 Latin Americans, and 31 Native American poets. We identified and recorded the start and end page numbers for each poem and released the annotations in CSV files. The dataset also reveals imbalances in the representation of poets from historically underrepresented groups within the HathiTrust corpus. We expect this dataset to support large-scale poetry analysis, uncover biases in natural language processing (NLP) models, assess their robustness when applied to culturally diverse poetic language, and promote the development of more inclusive models for diverse American poetry communities.

DOI: https://doi.org/10.5334/johd.508 | Journal eISSN: 2059-481X
Language: English
Submitted on: Jan 7, 2026
|
Accepted on: Feb 14, 2026
|
Published on: Mar 6, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Gyuri Kang, Kahyun Choi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.