Have a personal or library account? Click to login
A Classification Benchmark Based on the Literary Theme Ontology Cover

A Classification Benchmark Based on the Literary Theme Ontology

Open Access
|Feb 2026

Figures & Tables

Table 1

The summary dataset, showing the number of episodes, the shortest episode summary length, the longest episode summary length, and the average episode summary length per episode per TV series.

TV SERIES# EPISODESMIN WORD LENGTHMAX WORD LENGTHMEAN WORD LENGTH
Babylon 5451,0682,8781,767.6
Black Mirror125783,8371,581.7
Futurama812552,117716.3
Game of Thrones22,5184,8603,689.5
Guillermo del Toro’s Cabinet of Curiosities51,1113,6212,308.6
Red Dwarf395712,220952.1
Sherlock2526813669.5
Star Trek: Deep Space Nine1091,2537,4902,638.7
Star Trek: Enterprise431,1226,4582,148.6
Star Trek: The Animated Series57121,517941.2
Star Trek: The Next Generation661,1606,9102,440.7
Star Trek: The Original Series388413,7651,853.4
Star Trek: Voyager591,2656,1822,577.3
Tales from the Crypt (1989)31951,031572.1
Tales from the Loop (2020)42,5773,3222,958
The Twilight Zone Franchise1031541,093373
Total644957,4901,630.8
Table 2

The subtitles dataset, showing the number of episodes, the shortest subtitle length of an episode, the longest subtitle length of an episode, and the average subtitle length of the subtitle per episode per TV series.

TV SERIES# EPISODESMIN WORD LENGTHMAX WORD LENGTHMEAN WORD LENGTH
Alfred Hitchcock Presents1859693,5842,527.6
Amazing Stories (1985)151,0892,4021,608.3
Amazing Stories (2020)43,6494,8924,300
Babylon 5433,1975,5144,400.1
Black Mirror132,5637,1854,680.1
Brideshead Revisited (1981)104,0129,6045,087.4
Futurama581,7709,4022,659.4
Game of Thrones23,4605,4164,438
Guillermo del Toro’s Cabinet of Curiosities51,9644,8253,511
I Claudius94,9215,9435,300.8
Piece of Cake (1988)14,6084,6084,608
Red Dwarf72,6803,5393,222
Sherlock101,46010,2528,400.1
Star Trek: Deep Space Nine983,0734,9704,080.6
Star Trek: Enterprise242,9314,8783,851.8
Star Trek: The Animated Series21,9252,3532,139
Star Trek: The Next Generation562,7569,6224,079.2
Star Trek: The Original Series303,1325,6814,340.1
Star Trek: Voyager483,2415,6674,692.4
Tales from the Crypt (1989)743923,5142,101.1
Tales from the Loop (2020)36771,9541,440.3
Tales of the Unexpected771,4994,0232,441.1
The Alfred Hitchcock Hour842,9427,4084,925.4
The Twilight Zone Franchise984865,4212,397.2
Total95639210,2523,374.8
Table 3

The 10 most frequent themes in the available summaries and subtitles. The total number of theme occurrence is higher than the total number of episodes in each dataset, as one episode can contain multiple themes.

(A) SUMMARIES
THEME# OCCURRENCES
father and son73
friendship101
greed for riches62
human vs. captivity63
humanoid robot75
husband and wife113
infatuation104
romantic love88
the desire for vengeance81
time travel63
Total832
(B) SUBTITLES
extramarital affair96
father and son98
friendship130
greed for riches108
husband and wife348
infatuation140
murder166
romantic love132
spouse murder127
the desire for vengeance155
Total1,500
Table 4

The results for the summary and subtitle datasets. All models were tested on the complete test set and evaluated using the macro precision, recall, and F1 scores. For both datasets, the highest F1 score is highlighted in bold.

ModelSummarySubtitles
PrecisionRecallF1PrecisionRecallF1
LogReg bigrams 1,0000.330.740.440.310.780.42
LogReg bigrams 5,0000.340.750.450.310.780.43
SVM bigrams 5,0000.380.770.500.320.790.44
SVM bigrams 10,0000.390.770.510.320.740.44
FastText LogReg0.230.600.330.220.660.33
FastText SVM0.240.630.340.240.630.34
Setfit Undersampling0.430.220.280.180.240.20
Setfit Unique0.500.280.340.240.230.21
Setfit Oversampling0.440.300.340.170.170.16
LLM: zero-shot
Mistral 7B instruct0.370.480.360.330.190.20
Gemma3:12b-it-qat0.370.640.420.300.250.21
llama3.1:8b-instruct-q8_00.320.520.380.310.210.22
LLM: few-shot
Mistral 7B instruct0.310.420.310.310.130.16
Gemma3:12b-it-qat0.380.510.400.330.140.15
llama3.1:8b-instruct-q8_00.370.450.370.330.160.15
Table 5

The results for the highest-performing model on the summary dataset (SVM, 10,000 features). Results are shown per theme.

THEMEPRECISIONRECALLF1SUPPORT
husband and wife0.510.770.6226
infatuation0.380.670.4818
friendship0.220.950.3620
romantic love0.180.640.2814
the desire for vengeance0.290.630.4016
humanoid robot0.580.940.7116
father and son0.400.920.5613
human vs. captivity0.320.730.4411
time travel0.820.750.7812
greed for riches0.330.800.4715
macro avg0.400.780.51161
Table 6

The results for the highest-performing model on the subtitle dataset (SVM, 5000 features). Results are shown per theme.

THEMEPRECISIONRECALLF1SUPPORT
husband and wife0.660.850.7465
murder0.340.840.4932
the desire for vengeance0.220.560.3132
infatuation0.390.610.4833
romantic love0.260.890.4026
friendship0.190.810.3021
spouse murder0.280.960.4324
greed for riches0.270.720.3925
father and son0.320.880.4717
extramarital affair0.280.780.4118
macro avg0.320.790.44293
Table 7

The results for the highest-performing LLM on the summary dataset (Gemma3:12b-it-qat, zero-shot). Results are shown per theme.

THEMEPRECISIONRECALLF1SUPPORT
husband and wife0.590.620.6026
infatuation0.180.780.2918
friendship0.220.850.3520
romantic love0.190.710.3014
the desire for vengeance0.200.500.2916
humanoid robot0.570.500.5316
father and son0.750.460.5713
human vs. captivity0.180.180.1811
time travel0.530.830.6512
greed for riches0.260.930.4115
macro avg0.370.640.42161
Table 8

The results for the highest-performing LLM on the subtitle dataset (llama3.1:8b-instruct-q8_0, zero-shot). Results are shown per theme.

THEMEPRECISIONRECALLF1SUPPORT
husband and wife0.670.190.2965
murder0.440.720.5532
the desire for vengeance0.250.030.0632
infatuation0.150.090.1133
romantic love0.270.120.1626
friendship0.180.140.1621
spouse murder0.290.080.1324
greed for riches0.390.200.2625
father and son0.330.240.2817
extramarital affair0.140.330.2018
macro avg0.310.210.22293
Table 9

The percentage of wrong output per model for both the summary and subtitle datasets. The predictions are listed as wrong output when the provided message is in a different format than the requested list of themes. The first column is the percentage of wrong output of the predictions on the whole test set, and the second column shows the percentage of wrong output for the input texts smaller than the average length of summaries or subtitles in the dataset. The last column shows the percentage of wrong output for the input text above the average length.

MODELWRONG OUTPUTWRONG OUTPUT BELOW AVGWRONG OUTPUT ABOVE AVG
Summary
LLM: zero-shot
Mistral 7B instruct14.0%0.0%14.0%
Gemma3:12b-it-qat10.1%0.0%10.1%
llama3.1:8b-instruct-q8_010.1%0.0%10.1%
LLM: few-shot
Mistral 7B instruct20.9%0.0%20.9%
Gemma3:12b-it-qat16.3%0.8%15.5%
llama3.1:8b-instruct-q8_017.8%0.0%17.8%
Subtitle
LLM: zero-shot
Mistral 7B instruct48.4%0.0%48.4%
Gemma3:12b-it-qat48.4%0.0%48.4%
llama3.1:8b-instruct-q8_047.9%0.0%47.9%
LLM: few-shot
Mistral 7B instruct62.5%0.0%62.5%
Gemma3:12b-it-qat54.2%0.0%54.2%
llama3.1:8b-instruct-q8_052.1%0.0%52.1%
johd-12-480-g1.png
Figure 1

A shortened version of the summary input text of the episode Bendin’ in the Wind of Futurama, which is the 13th episode of season 3.

johd-12-480-g2.png
Figure 2

A shortened version of the summary input text of the episode The Cyber House Rules of Futurama, which is the 9th episode of season 3.

DOI: https://doi.org/10.5334/johd.480 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 14, 2025
|
Accepted on: Jan 19, 2026
|
Published on: Feb 18, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Noa Visser Solissa, Paul Sheridan, Mikael Onsjö, Andreas van Cranenburgh, Federico Pianzola, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.