Have a personal or library account? Click to login
A Classification Benchmark Based on the Literary Theme Ontology Cover

A Classification Benchmark Based on the Literary Theme Ontology

Open Access
|Feb 2026

Abstract

We introduce two new datasets of TV episode summaries (n = 644) and subtitles (n = 956) in English, human-annotated with one or multiple themes. The datasets are derived from the Literary Theme Ontology, an ontology that uses well-defined definitions of themes to identify themes in stories. This multi-label classification task is then tested on bag-of-words classification models, as well as small open-weight LLMs. Since themes in TV series episodes do not have to be explicitly mentioned in a summary or in the subtitles, and the themes themselves can be rather abstract, the theme classification is a hard task. SVM classifiers are most successful at predicting the themes in TV episode summaries and subtitles (F1 = 0.50 and 0.44). The results also show that the length of the input text strongly influences the ability of the LLMs to follow the instructions given in the prompt, and answer in the provided output format.

DOI: https://doi.org/10.5334/johd.480 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 14, 2025
|
Accepted on: Jan 19, 2026
|
Published on: Feb 18, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Noa Visser Solissa, Paul Sheridan, Mikael Onsjö, Andreas van Cranenburgh, Federico Pianzola, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.