References
- Bagga, S., & Piper, A. (2022). HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust. Journal of Open Humanities Data, 8,
7 . 10.5334/johd.71 - Choi, K., & Kang, G. (2025). An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI. Information Research an International Electronic Journal, 30(iConf), 610–617. 10.47989/ir30iConf47263
- Hamilton, S., & Piper, A. (2023). MultiHATHI: A Complete Collection of Multilingual Prose Fiction in the HathiTrust Digital Library. Journal of Open Humanities Data, 9,
3 . 10.5334/johd.95 - HathiTrust Research Center. (n.d.). Data capsules.
https://analytics.hathitrust.org/staticcapsules - Jiang, M., Dubnicek, R. C., Worthey, G., Underwood, T., & Downie, J. S. (2022). A prototype gutenberg-hathitrust sentence-level parallel corpus for OCR error analysis: Pilot investigations. Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries, 1–5. 10.1145/3529372.3533298
- Jiang, M., Hu, Y., Worthey, G., Capitanu, B., Kudeki, D., & Downie, J. S. (2021). The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts. iConference 2021.
http://hdl.handle.net/2142/109695 - Lehmann, M., Heumann, A., Kuijpers, M. M., Lauer, G., & Lüdtke, J. (2023). The ChildPoeDE Corpus: 1082 German Children’s Poems for Computational and Experimental Studies on Poetry Reception. Journal of Open Humanities Data, 9,
6 . 10.5334/johd.102 - Lucy, L., Griffiths, C., Ying, C., Kim-Ebio, J., Baur, S., Levine, S., Eberhardt, J., Bamman, D., & Demszky, D. (2025). Racial and Ethnic Representation in Literature Taught in US High Schools. Journal of Cultural Analytics 10(1). 10.22148/001c.131682
- Marco, G., De La Rosa, J., Gonzalo, J., Ros, S., & Gonzalez-Blanco, E. (2021). Automated Metric Analysis of Spanish Poetry: Two Complementary Approaches. IEEE Access, 9, 51734–51746. 10.1109/ACCESS.2021.3069635
- Naaz, K., & Singh, N. K. (2022). Design and Development of Computational Tools for Analyzing Elements of Hindi Poetry. IEEE Access, 10, 97733–97747. 10.1109/ACCESS.2022.3204388
- Parulian, N. N., Dubnicek, R., Evans, D. J., Hu, Y., Layne-Worthey, G., Downie, J. S., Heaton, R., Lu, K., Orr, R. I., Magni, I., & Walsh, J. A. (2023). Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature. Proceedings of the Association for Information Science and Technology, 60(1), 681–685. 10.1002/pra2.839
- Saini, J. R., & Kaur, J. (2020). Kāvi: An Annotated Corpus of Punjabi Poetry with Emotion Detection Based on ‘Navrasa.’ Procedia Computer Science, 167, 1220–1229. 10.1016/j.procs.2020.03.436
- Schug, J., Gosin, M., & Alt, N. P. (2025). A historical psychology approach to gendered racial stereotypes: An examination of a multi-million book sample of 20th century texts. Current Research in Ecological and Social Psychology, 9. 10.1016/j.cresp.2025.100248
- Shang, W., & Underwood, T. (2024). Disentangling semantic and prosodic features of English poetry. Digital Scholarship in the Humanities,
fqae008 . 10.1093/llc/fqae008 - So, R. J. (2020). Redlining culture: A data history of racial inequality and postwar fiction. Columbia University Press. 10.7312/so--19772
- Sprugnoli, R., Mambrini, F., Passarotti, M., & Moretti, G. (2023). The Sentiment of Latin Poetry. Annotation and Automatic Analysis of the Odes of Horace. Italian Journal of Computational Linguistics, 9(1). 10.4000/ijcol.1125
- Timofeeva, M. (2021). Comparative Analysis of Reasoning in Russian Classic Poetry. Applied Sciences, 11(18),
8665 . 10.3390/app11188665 - Underwood, T., Kimutis, P., & Witte, J. (2020). NovelTM Datasets for English-Language Fiction, 1700–2009. Journal of Cultural Analytics, 5(2). 10.22148/001c.13147
- Wessler, H. (2020). From marginalisation to rediscovery of identity: Dalit and Adivasi voices in Hindi literature. Studia Neophilologica, 92(2), 159–174. 10.1080/00393274.2020.1751703
