
Figure 1
J.S.Bach, Prelude in C, BWV846: Measures 1–11 of the score with an RN analysis given in the text below the lowest stave.
Table 1
The contents of our meta-corpus, drawing together existing harmonic analysis datasets. The relative size of each corpus is given by the total, combined number of RNs in the analyses, the number of measures in the scores, and also the ‘Quarter length’: a metric for the total length in quarter notes.
| Dataset | Composer/s | Movements or equivalent | Quarter length | Measures | RNs |
| TAVERN | Mozart | 10 theme and variations sets | 7 712 | 2 773 | 8 779 |
| Beethoven | 17 theme and variations sets | 12 840 | 5 128 | 15 959 | |
| ABC | Beethoven | 16 string quartets, 70 movements | 48 811 | 15 881 | 29 652 |
| BPS-FH | Beethoven | 32 piano sonata first movements | 30 992 | 9 420 | 11 337 |
| Roman Text | Bach | 24 preludes | 3 168 | 819 | 2 165 |
| Various (19th C.) | 48 romantic songs | 8 326 | 2 791 | 5 283 | |
| Totals | 201 scores | 111 859 | 36 812 | 73 175 |

Figure 2
Measures 22–24 of the same Bach prelude of Figure 1.

Figure 3
Harmonic ambiguity in ‘Einsamkeit’ from Schubert’s Winterreise (D.911, No.12). The three parallel analyses represent A1, A2, and A3 respectively from top to bottom.
Table 2
Different interpretations of measures 34 and 35 of Schubert’s ‘Einsamkeit’ (see Figure 3). The analyses are written in .rntxt format (Tymoczko et al., 2019), as explained in Section 3.1. The ‘rules’ in the second and third column are set out at the beginning of Section 2.
| RN | Rules followed/broken | |
| A1 | m34 b: i | rules 1 and 4 |
| m35 i | rule 3 | |
| A2 | m34 b: i b1.5 Ger42 | rules 3 and 2 |
| m35 Ger42 | rule 1 | |
| A3 | m34 b: i | rules 1 and 4 |
| m35 G: I | rule 3 |
Table 3
The RN and tabular representations used corresponding to the Bach extract in Figure 1. The first column sets out RNs in Tymoczko et al. (2019)‘s ‘Roman text’ format, and the remaining columns unpack that information according to our adaptation of Chen and Su (2018)‘s tabular standard.
| RNTXT | Start | End | Key | Degree | Quality | Inv. |
| m1 C: I | 0.0 | 4.0 | C | 1 | M | 0 |
| m2 ii42 | 4.0 | 8.0 | C | 2 | m7 | 3 |
| m3 V65 | 8.0 | 12.0 | C | 5 | D7 | 1 |
| m4 I | 12.0 | 16.0 | C | 1 | M | 0 |
| m5 vi6 | 16.0 | 20.0 | C | 6 | m | 1 |
| m6 G: V42 | 20.0 | 24.0 | G | 5 | D7 | 3 |
| m7 I6 | 24.0 | 28.0 | G | 1 | M | 1 |
| m8 IV42 | 28.0 | 32.0 | G | 4 | M7 | 3 |
| m9 ii7 | 32.0 | 36.0 | G | 2 | m7 | 0 |
| m10 V7 | 36.0 | 40.0 | G | 5 | D7 | 0 |
| m11 I | 40.0 | 44.0 | G | 1 | M | 0 |
Table 4
Total dimension of input vector for each pitch encoding option (limited to 7 octaves and double sharps/flats).
| Chromatic pitch, full (CPf) 7 × 12 = 84 | Pitch spelling, full (PSf) 7 × 35 = 245 |
| CP class + bass (CPb) 12 + 12 = 24 | PS class + bass (PSb) 35 + 35 = 70 |
| CP class (CPc) 12 | PS class (PSc) 35 |

Figure 4
The distribution of work transpositions that remain within the set limits of F♭♭–B
for pitches and C♭– C
for keys.

Figure 5
Architecture of the neural network model in the ‘local’ training mode. When ‘global’, Quality/Inversion/Root outputs are computed after the fully connected layer instead. The numbers in the boxes refer to the number of categories for each output label in the PSb case (see Table 4).
Table 5
Comparison of the percent accuracy between models. The two rows above the internal division report on our best model – ConvGRU with pitch spelling and bass (PSb) and with global training. The first row reports on training with all available data; the second reduces the available data to the smaller corpus used by Chen and Su (2018). Rows below the internal dividing line provide comparison data for the performance of Chen and Su (2018, 2019), as well as a baseline key detection using pitch profiles by Temperley (1999). ‘Degree’ registers as correct only when the predictions match the corpus entry for both Degrees 1 and 2; ‘RN’ is correct only when all four of the previous columns match in that way.
Table 6
Results obtained by averaging the accuracy of several models on four different axes: architecture, input registral information, input spelling, and global/local training. Column labels are the same as for Table 5, and the first row likewise relates once again to the best performing model. Each sub-table thereafter shows the average performance of several models. For example, the ConvGRU row shows the average of 12 models with the same architecture but using different input representations and registral information. The values in the first row of each sub-table represent the percentage accuracy of the corresponding averaged models as a reference; each line thereafter shows the +/– difference in accuracy from the reference. There are only 6 PoolGRU models, as they can be trained only globally (not locally).
| Key | Degree | Quality | Inversion | RN | ||
| ConvGRU + PSb + global | 82.9 | 68.3 | 76.6 | 72.0 | 42.8 | |
| ConvGRU | 12 | 81.9 | 67.4 | 74.6 | 67.9 | 37.8 |
| ConvDil | 12 | –2.4 | –1.8 | –0.8 | –0.5 | –1.7 |
| PoolGRU | 6 | –2.3 | –3.0 | –1.6 | –1.8 | –4.1 |
| bass | 10 | 80.8 | 66.6 | 74.3 | 70.1 | 39.2 |
| full | 10 | –0.7 | –0.9 | –0.6 | –3.5 | –3.7 |
| class | 10 | –0.1 | –0.7 | –0.1 | –4.7 | –4.7 |
| spelling | 15 | 80.6 | 66.2 | 74.1 | 67.6 | 36.5 |
| chromatic | 15 | –0.3 | –0.3 | –0.2 | –0.5 | –0.4 |
| global | 15 | 80.6 | 66.8 | 75.4 | 66.7 | 36.9 |
| local | 15 | +0.3 | –0.7 | –2.4 | +2.0 | +0.2 |
Table 7
A comparison between the corpus analysis (left, reproducing Table 3) and our system’s output (right). Discrepancies between the input and output analyses are highlighted in italics.
| RN | Corpus | Output | ||||||||||
| Start | End | Key | Degree | Quality | Inv. | Start | End | Key | Degree | Quality | Inv. | |
| m1 C: I | 0.0 | 4.0 | C | 1 | M | 0 | 0.0 | 4.0 | C | 1 | M | 0 |
| m2 ii42 | 4.0 | 8.0 | C | 2 | m7 | 3 | 4.0 | 4.5 | C | 2 | m7 | 0 |
| 4.5 | 7.0 | C | 2 | m7 | 1 | |||||||
| 7.0 | 7.5 | C | 2 | D7 | 0 | |||||||
| 7.5 | 8.0 | C | 5 | D7 | 0 | |||||||
| m3 V65 | 8.0 | 12.0 | C | 5 | D7 | 1 | 8.0 | 8.5 | C | 5 | D7 | 1 |
| 8.5 | 9.5 | C | 5 | M | 1 | |||||||
| 9.5 | 10.0 | C | 5 | D7 | 1 | |||||||
| 10.0 | 11.0 | C | 5 | M | 1 | |||||||
| 11.0 | 12.0 | C | 5 | D7 | 1 | |||||||
| m4 I | 12.0 | 16.0 | C | 1 | M | 0 | 12.0 | 16.0 | C | 1 | M | 0 |
| m5 vi6 | 16.0 | 20.0 | C | 6 | m | 1 | 16.0 | 16.5 | C | 1 | m | 0 |
| 16.5 | 17.0 | C | 6 | m | 0 | |||||||
| 17.0 | 20.0 | C | 6 | m | 1 | |||||||

Figure 6
Beethoven’s piano sonata no.6, m.40–43.
