Table 1
The summary dataset, showing the number of episodes, the shortest episode summary length, the longest episode summary length, and the average episode summary length per episode per TV series.
| TV SERIES | # EPISODES | MIN WORD LENGTH | MAX WORD LENGTH | MEAN WORD LENGTH |
|---|---|---|---|---|
| Babylon 5 | 45 | 1,068 | 2,878 | 1,767.6 |
| Black Mirror | 12 | 578 | 3,837 | 1,581.7 |
| Futurama | 81 | 255 | 2,117 | 716.3 |
| Game of Thrones | 2 | 2,518 | 4,860 | 3,689.5 |
| Guillermo del Toro’s Cabinet of Curiosities | 5 | 1,111 | 3,621 | 2,308.6 |
| Red Dwarf | 39 | 571 | 2,220 | 952.1 |
| Sherlock | 2 | 526 | 813 | 669.5 |
| Star Trek: Deep Space Nine | 109 | 1,253 | 7,490 | 2,638.7 |
| Star Trek: Enterprise | 43 | 1,122 | 6,458 | 2,148.6 |
| Star Trek: The Animated Series | 5 | 712 | 1,517 | 941.2 |
| Star Trek: The Next Generation | 66 | 1,160 | 6,910 | 2,440.7 |
| Star Trek: The Original Series | 38 | 841 | 3,765 | 1,853.4 |
| Star Trek: Voyager | 59 | 1,265 | 6,182 | 2,577.3 |
| Tales from the Crypt (1989) | 31 | 95 | 1,031 | 572.1 |
| Tales from the Loop (2020) | 4 | 2,577 | 3,322 | 2,958 |
| The Twilight Zone Franchise | 103 | 154 | 1,093 | 373 |
| Total | 644 | 95 | 7,490 | 1,630.8 |
Table 2
The subtitles dataset, showing the number of episodes, the shortest subtitle length of an episode, the longest subtitle length of an episode, and the average subtitle length of the subtitle per episode per TV series.
| TV SERIES | # EPISODES | MIN WORD LENGTH | MAX WORD LENGTH | MEAN WORD LENGTH |
|---|---|---|---|---|
| Alfred Hitchcock Presents | 185 | 969 | 3,584 | 2,527.6 |
| Amazing Stories (1985) | 15 | 1,089 | 2,402 | 1,608.3 |
| Amazing Stories (2020) | 4 | 3,649 | 4,892 | 4,300 |
| Babylon 5 | 43 | 3,197 | 5,514 | 4,400.1 |
| Black Mirror | 13 | 2,563 | 7,185 | 4,680.1 |
| Brideshead Revisited (1981) | 10 | 4,012 | 9,604 | 5,087.4 |
| Futurama | 58 | 1,770 | 9,402 | 2,659.4 |
| Game of Thrones | 2 | 3,460 | 5,416 | 4,438 |
| Guillermo del Toro’s Cabinet of Curiosities | 5 | 1,964 | 4,825 | 3,511 |
| I Claudius | 9 | 4,921 | 5,943 | 5,300.8 |
| Piece of Cake (1988) | 1 | 4,608 | 4,608 | 4,608 |
| Red Dwarf | 7 | 2,680 | 3,539 | 3,222 |
| Sherlock | 10 | 1,460 | 10,252 | 8,400.1 |
| Star Trek: Deep Space Nine | 98 | 3,073 | 4,970 | 4,080.6 |
| Star Trek: Enterprise | 24 | 2,931 | 4,878 | 3,851.8 |
| Star Trek: The Animated Series | 2 | 1,925 | 2,353 | 2,139 |
| Star Trek: The Next Generation | 56 | 2,756 | 9,622 | 4,079.2 |
| Star Trek: The Original Series | 30 | 3,132 | 5,681 | 4,340.1 |
| Star Trek: Voyager | 48 | 3,241 | 5,667 | 4,692.4 |
| Tales from the Crypt (1989) | 74 | 392 | 3,514 | 2,101.1 |
| Tales from the Loop (2020) | 3 | 677 | 1,954 | 1,440.3 |
| Tales of the Unexpected | 77 | 1,499 | 4,023 | 2,441.1 |
| The Alfred Hitchcock Hour | 84 | 2,942 | 7,408 | 4,925.4 |
| The Twilight Zone Franchise | 98 | 486 | 5,421 | 2,397.2 |
| Total | 956 | 392 | 10,252 | 3,374.8 |
Table 3
The 10 most frequent themes in the available summaries and subtitles. The total number of theme occurrence is higher than the total number of episodes in each dataset, as one episode can contain multiple themes.
| (A) SUMMARIES | |
|---|---|
| THEME | # OCCURRENCES |
| father and son | 73 |
| friendship | 101 |
| greed for riches | 62 |
| human vs. captivity | 63 |
| humanoid robot | 75 |
| husband and wife | 113 |
| infatuation | 104 |
| romantic love | 88 |
| the desire for vengeance | 81 |
| time travel | 63 |
| Total | 832 |
| (B) SUBTITLES | |
| extramarital affair | 96 |
| father and son | 98 |
| friendship | 130 |
| greed for riches | 108 |
| husband and wife | 348 |
| infatuation | 140 |
| murder | 166 |
| romantic love | 132 |
| spouse murder | 127 |
| the desire for vengeance | 155 |
| Total | 1,500 |
Table 4
The results for the summary and subtitle datasets. All models were tested on the complete test set and evaluated using the macro precision, recall, and F1 scores. For both datasets, the highest F1 score is highlighted in bold.
| Model | Summary | Subtitles | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | |
| LogReg bigrams 1,000 | 0.33 | 0.74 | 0.44 | 0.31 | 0.78 | 0.42 |
| LogReg bigrams 5,000 | 0.34 | 0.75 | 0.45 | 0.31 | 0.78 | 0.43 |
| SVM bigrams 5,000 | 0.38 | 0.77 | 0.50 | 0.32 | 0.79 | 0.44 |
| SVM bigrams 10,000 | 0.39 | 0.77 | 0.51 | 0.32 | 0.74 | 0.44 |
| FastText LogReg | 0.23 | 0.60 | 0.33 | 0.22 | 0.66 | 0.33 |
| FastText SVM | 0.24 | 0.63 | 0.34 | 0.24 | 0.63 | 0.34 |
| Setfit Undersampling | 0.43 | 0.22 | 0.28 | 0.18 | 0.24 | 0.20 |
| Setfit Unique | 0.50 | 0.28 | 0.34 | 0.24 | 0.23 | 0.21 |
| Setfit Oversampling | 0.44 | 0.30 | 0.34 | 0.17 | 0.17 | 0.16 |
| LLM: zero-shot | ||||||
| Mistral 7B instruct | 0.37 | 0.48 | 0.36 | 0.33 | 0.19 | 0.20 |
| Gemma3:12b-it-qat | 0.37 | 0.64 | 0.42 | 0.30 | 0.25 | 0.21 |
| llama3.1:8b-instruct-q8_0 | 0.32 | 0.52 | 0.38 | 0.31 | 0.21 | 0.22 |
| LLM: few-shot | ||||||
| Mistral 7B instruct | 0.31 | 0.42 | 0.31 | 0.31 | 0.13 | 0.16 |
| Gemma3:12b-it-qat | 0.38 | 0.51 | 0.40 | 0.33 | 0.14 | 0.15 |
| llama3.1:8b-instruct-q8_0 | 0.37 | 0.45 | 0.37 | 0.33 | 0.16 | 0.15 |
Table 5
The results for the highest-performing model on the summary dataset (SVM, 10,000 features). Results are shown per theme.
| THEME | PRECISION | RECALL | F1 | SUPPORT |
|---|---|---|---|---|
| husband and wife | 0.51 | 0.77 | 0.62 | 26 |
| infatuation | 0.38 | 0.67 | 0.48 | 18 |
| friendship | 0.22 | 0.95 | 0.36 | 20 |
| romantic love | 0.18 | 0.64 | 0.28 | 14 |
| the desire for vengeance | 0.29 | 0.63 | 0.40 | 16 |
| humanoid robot | 0.58 | 0.94 | 0.71 | 16 |
| father and son | 0.40 | 0.92 | 0.56 | 13 |
| human vs. captivity | 0.32 | 0.73 | 0.44 | 11 |
| time travel | 0.82 | 0.75 | 0.78 | 12 |
| greed for riches | 0.33 | 0.80 | 0.47 | 15 |
| macro avg | 0.40 | 0.78 | 0.51 | 161 |
Table 6
The results for the highest-performing model on the subtitle dataset (SVM, 5000 features). Results are shown per theme.
| THEME | PRECISION | RECALL | F1 | SUPPORT |
|---|---|---|---|---|
| husband and wife | 0.66 | 0.85 | 0.74 | 65 |
| murder | 0.34 | 0.84 | 0.49 | 32 |
| the desire for vengeance | 0.22 | 0.56 | 0.31 | 32 |
| infatuation | 0.39 | 0.61 | 0.48 | 33 |
| romantic love | 0.26 | 0.89 | 0.40 | 26 |
| friendship | 0.19 | 0.81 | 0.30 | 21 |
| spouse murder | 0.28 | 0.96 | 0.43 | 24 |
| greed for riches | 0.27 | 0.72 | 0.39 | 25 |
| father and son | 0.32 | 0.88 | 0.47 | 17 |
| extramarital affair | 0.28 | 0.78 | 0.41 | 18 |
| macro avg | 0.32 | 0.79 | 0.44 | 293 |
Table 7
The results for the highest-performing LLM on the summary dataset (Gemma3:12b-it-qat, zero-shot). Results are shown per theme.
| THEME | PRECISION | RECALL | F1 | SUPPORT |
|---|---|---|---|---|
| husband and wife | 0.59 | 0.62 | 0.60 | 26 |
| infatuation | 0.18 | 0.78 | 0.29 | 18 |
| friendship | 0.22 | 0.85 | 0.35 | 20 |
| romantic love | 0.19 | 0.71 | 0.30 | 14 |
| the desire for vengeance | 0.20 | 0.50 | 0.29 | 16 |
| humanoid robot | 0.57 | 0.50 | 0.53 | 16 |
| father and son | 0.75 | 0.46 | 0.57 | 13 |
| human vs. captivity | 0.18 | 0.18 | 0.18 | 11 |
| time travel | 0.53 | 0.83 | 0.65 | 12 |
| greed for riches | 0.26 | 0.93 | 0.41 | 15 |
| macro avg | 0.37 | 0.64 | 0.42 | 161 |
Table 8
The results for the highest-performing LLM on the subtitle dataset (llama3.1:8b-instruct-q8_0, zero-shot). Results are shown per theme.
| THEME | PRECISION | RECALL | F1 | SUPPORT |
|---|---|---|---|---|
| husband and wife | 0.67 | 0.19 | 0.29 | 65 |
| murder | 0.44 | 0.72 | 0.55 | 32 |
| the desire for vengeance | 0.25 | 0.03 | 0.06 | 32 |
| infatuation | 0.15 | 0.09 | 0.11 | 33 |
| romantic love | 0.27 | 0.12 | 0.16 | 26 |
| friendship | 0.18 | 0.14 | 0.16 | 21 |
| spouse murder | 0.29 | 0.08 | 0.13 | 24 |
| greed for riches | 0.39 | 0.20 | 0.26 | 25 |
| father and son | 0.33 | 0.24 | 0.28 | 17 |
| extramarital affair | 0.14 | 0.33 | 0.20 | 18 |
| macro avg | 0.31 | 0.21 | 0.22 | 293 |
Table 9
The percentage of wrong output per model for both the summary and subtitle datasets. The predictions are listed as wrong output when the provided message is in a different format than the requested list of themes. The first column is the percentage of wrong output of the predictions on the whole test set, and the second column shows the percentage of wrong output for the input texts smaller than the average length of summaries or subtitles in the dataset. The last column shows the percentage of wrong output for the input text above the average length.
| MODEL | WRONG OUTPUT | WRONG OUTPUT BELOW AVG | WRONG OUTPUT ABOVE AVG |
|---|---|---|---|
| Summary | |||
| LLM: zero-shot | |||
| Mistral 7B instruct | 14.0% | 0.0% | 14.0% |
| Gemma3:12b-it-qat | 10.1% | 0.0% | 10.1% |
| llama3.1:8b-instruct-q8_0 | 10.1% | 0.0% | 10.1% |
| LLM: few-shot | |||
| Mistral 7B instruct | 20.9% | 0.0% | 20.9% |
| Gemma3:12b-it-qat | 16.3% | 0.8% | 15.5% |
| llama3.1:8b-instruct-q8_0 | 17.8% | 0.0% | 17.8% |
| Subtitle | |||
| LLM: zero-shot | |||
| Mistral 7B instruct | 48.4% | 0.0% | 48.4% |
| Gemma3:12b-it-qat | 48.4% | 0.0% | 48.4% |
| llama3.1:8b-instruct-q8_0 | 47.9% | 0.0% | 47.9% |
| LLM: few-shot | |||
| Mistral 7B instruct | 62.5% | 0.0% | 62.5% |
| Gemma3:12b-it-qat | 54.2% | 0.0% | 54.2% |
| llama3.1:8b-instruct-q8_0 | 52.1% | 0.0% | 52.1% |

Figure 1
A shortened version of the summary input text of the episode Bendin’ in the Wind of Futurama, which is the 13th episode of season 3.

Figure 2
A shortened version of the summary input text of the episode The Cyber House Rules of Futurama, which is the 9th episode of season 3.
