A Classification Benchmark Based on the Literary Theme Ontology

Noa Visser Solissa; Paul Sheridan; Mikael Onsjö; Andreas van Cranenburgh; Federico Pianzola

doi:10.5334/johd.480

Abstract

We introduce two new datasets of TV episode summaries (n = 644) and subtitles (n = 956) in English, human-annotated with one or multiple themes. The datasets are derived from the Literary Theme Ontology, an ontology that uses well-defined definitions of themes to identify themes in stories. This multi-label classification task is then tested on bag-of-words classification models, as well as small open-weight LLMs. Since themes in TV series episodes do not have to be explicitly mentioned in a summary or in the subtitles, and the themes themselves can be rather abstract, the theme classification is a hard task. SVM classifiers are most successful at predicting the themes in TV episode summaries and subtitles (F₁ = 0.50 and 0.44). The results also show that the length of the input text strongly influences the ability of the LLMs to follow the instructions given in the prompt, and answer in the provided output format.

References

Almeida, P. D., & Gnoli, C. (2021). Fiction in a phenomenon-based classification. Cataloging & Classification Quarterly, 59(5), 477–491. 10.1080/01639374.2021.1946232
Open DOI Search in Google Scholar Back to article
Armstrong, R., & Armstrong, M. (2001). Encyclopedia of film themes, settings and series. McFarland.
Search in Google Scholar Back to article
Baker, S. L., & Shepherd, G. W. (1987). Fiction classification schemes: the principles behind them and their success. RQ, 245–251.
Search in Google Scholar Back to article
Bamman, D., Chang, K. K., Lucy, L., & Zhou, N. (2024). On classification with large language models in cultural analytics. In Proceedings of Computational Humanities Research. Retrieved from https://ceur-ws.org/Vol-3834/paper119.pdf
Search in Google Scholar Back to article
Bartalesi, V., & Meghini, C. (2017). Using an ontology for representing the knowledge on literary texts: The Dante Alighieri case study. Semantic Web, 8(3), 385–394. 10.3233/SW-150198
Open DOI Search in Google Scholar Back to article
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Search in Google Scholar Back to article
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. 10.1162/tacl_a_00051
Open DOI Search in Google Scholar Back to article
Bremond, C., Landy, J., & Pavel, T. (Eds.) (1995). Thematics: New approaches. SUNY Press.
Search in Google Scholar Back to article
Brinker, M. (1993). Theme and interpretation (in sollors, werner ed. the return of thematic criticism cambridge, massachusetts. Harvard University Press.
Search in Google Scholar Back to article
Encyclopedia of Science Fiction contributors. (2025). The Encyclopedia of Science Fiction. Retrieved from http://www.sf-encyclopedia.com (Accessed: 12-Nov-2025).
Search in Google Scholar Back to article
Fandom. (2025). Fandom. Retrieved from https://www.fandom.com (Accessed: 14-Nov-2025).
Search in Google Scholar Back to article
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., … Ma, Z. (2024). The llama 3 herd of models. Retrieved from https://arxiv.org/abs/2407.21783
Search in Google Scholar Back to article
Hagedorn, J., & Darányi, S. (2022). Bearing a bag-of-tales: An open corpus of annotated folktales for reproducible research. Journal of Open Humanities Data, 8(16). 10.5334/johd.78
Open DOI Search in Google Scholar Back to article
Hudson, W. H. (1913). An introduction to the study of literature. George G. Harrap & Company.
Search in Google Scholar Back to article
Kamath, A., et al. (2025). Gemma 3: A multimodal addition to the gemma family of lightweight open models. arXiv preprint arXiv:2503.19786. 10.48550/arXiv.2503.19786
Open DOI Search in Google Scholar Back to article
Karsdorp, F., & van den Bosch, A. (2013). Identifying motifs in folktales using topic models. In Proceedings of the 22 annual belgian-dutch conference on machine learning (pp. 41–49).
Search in Google Scholar Back to article
Khan, F., Arrigoni, S., Boschetti, F., & Frontini, F. (2016). Restructuring a taxonomy of literary themes and motifs for more efficient querying. MATLIT: Materialities of Literature, 4(2), 11–27. 10.14195/2182-8830_4-2_1
Open DOI Search in Google Scholar Back to article
Louwerse, M. M., & Van Peer, W. (2008). Thematics: interdisciplinary studies. John Benjamins Publishing Company.
Search in Google Scholar Back to article
Lucy, L., Griffiths, C., Levine, S., Eberhardt, J. L., Demszky, D., & Bamman, D. (2025). Tell, don’t show: Leveraging language models’ abstractive retellings to model literary themes. In W. Che, J. Nabende, E. Shutova, & M. T. Pilehvar (Eds.), Findings of the Association for Computational Linguistics: ACL 2025 (pp. 22585–22610). Vienna, Austria: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2025.findings-acl.1162/
Search in Google Scholar Back to article
Mark Pejtersen, A., & Austin, J. (1983). Fiction retrieval: Experimental design and evaluation of a search system based on users’ value criteria (part 1). Journal of Documentation, 39(4), 230–246. 10.1108/eb026750
Open DOI Search in Google Scholar Back to article
Matveeva, M., & Malykh, V. (2022). Development of folklore motif classifier using limited data. In Conference on Artificial Intelligence and Natural Language (pp. 40–48). 10.1007/978-3-031-23372-2_4
Open DOI Search in Google Scholar Back to article
McClinton-Temple, J. (2010). Encyclopedia of themes in literature (No. v. 1). Facts On File.
Search in Google Scholar Back to article
Mistral AI. (2023). Mistral 7B. https://mistral.ai/news/announcing-mistral-7b/. (Accessed: 2025-07-17).
Search in Google Scholar Back to article
Nguyen, D., Trieschnigg, D., & Theune, M. (2013). Folktale classification using learning to rank. In European Conference on Information Retrieval (pp. 195–206). 10.1007/978-3-642-36973-5_17
Open DOI Search in Google Scholar Back to article
Onsjö, M., & Sheridan, P. (2020). Theme enrichment analysis: A statistical test for identifying significantly enriched themes in a list of stories with an application to the Star Trek television franchise. Digital Studies/Le champ numérique, 10(1), 1. 10.16995/dscn.316
Open DOI Search in Google Scholar Back to article
Onsjö, M., & Sheridan, P. (2025a). Literary Theme Ontology. GitHub release. Retrieved from https://github.com/theme-ontology/theming/releases/tag/v2025.04 (Accessed: 7 Nov. 2025).
Search in Google Scholar Back to article
Onsjö, M., & Sheridan, P. (2025b). totolo – A Python package for working with data from the Theme Ontology theming repository. https://pypi.org/project/totolo/. (Version 2.1.2).
Search in Google Scholar Back to article
Onsjö, M., & Sheridan, P. (2025c). Welcome to the Literary Theme Ontology tutorial. Retrieved from https://github.com/theme-ontology/theming/wiki (Accessed: 12 Nov. 2025).
Search in Google Scholar Back to article
Open Subtitles. (2025). Open Subtitles. Retrieved from https://www.opensubtitles.org (Accessed: 14-Nov-2025).
Search in Google Scholar Back to article
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85), 2825–2830. Retrieved from http://jmlr.org/papers/v12/pedregosa11a.html
Search in Google Scholar Back to article
Propp, V. (1968). Morphology of the folktale (2nd ed.; L. Scott, Trans.). Austin: University of Texas Press.
Search in Google Scholar Back to article
Rimmon-Kenan, S. (1995). What is the theme and how do we get at it? In C. Bremond, J. Landy, & T. Pavel (Eds.), Thematics: New approaches (pp. 9–19). SUNY Press.
Search in Google Scholar Back to article
Saarti, J. (2019). Fictional literature, classification and indexing. Knowledge Organization, 46(4), 320–332. 10.5771/0943-7444-2019-4-320
Open DOI Search in Google Scholar Back to article
Seigneuret, J. (1988). Dictionary of literary themes and motifs (No. v. 1). Greenwood Press.
Search in Google Scholar Back to article
Sheridan, P., Onsjö, M., & Hastings, J. (2019). The Literary Theme Ontology for media annotation and information retrieval. In A. Barton, S. Seppälä, & D. Porello (Eds.), Proceedings of the WODHSA. first International Workshop on Ontologies for Digital Humanities and Their Social Analysis. Part of the Fifth Joint Ontology Workshops (JOWO 2019) Episode V: The Styrian Autumn of Ontology. Joint Ontology Workshops. Retrieved from https://ceur-ws.org/Vol-2518/paper-WODHSA8.pdf
Search in Google Scholar Back to article
Sollors, W. (1993). The return of thematic criticism (No. 18). Harvard University Press.
Search in Google Scholar Back to article
Tunstall, L., Reimers, N., Jo, U. E. S., Bates, L., Korat, D., Wasserblat, M., & Pereg, O. (2022). Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055.
Search in Google Scholar Back to article
TV Trope contributors. (2025). TV Tropes — the all devouring pop-culture wiki. Retrieved from https://tvtropes.org (Accessed: 12-Nov-2025).
Search in Google Scholar Back to article
TVSubs.net. (2025). TVSubs.net — TV Show Subtitles. Retrieved from https://www.tvsubs.net (Accessed: 14-Nov-2025).
Search in Google Scholar Back to article
Uther, H.-J. (2004). The types of international folktales: A classification and bibliography. FF communications.
Search in Google Scholar Back to article
Wikipedia contributors. (2025). List of science fiction themes — Wikipedia, The Free Encyclopedia. Retrieved from https://en.wikipedia.org/wiki/List_of_science_fiction_themes (Accessed: 12-Nov-2025).
Search in Google Scholar Back to article
Yarlott, W. V. H., & Finlayson, M. A. (2016). Learning a better motif index: Toward automated motif extraction. In 7th Workshop on Computational Models of Narrative (CMN 2016) (pp. 7–1).
Search in Google Scholar Back to article

A Classification Benchmark Based on the Literary Theme Ontology

Abstract

Paradigm

My account