Have a personal or library account? Click to login
Not All Roads Lead to Rome: Pitch Representation and Model Architecture for Automatic Harmonic Analysis Cover

Not All Roads Lead to Rome: Pitch Representation and Model Architecture for Automatic Harmonic Analysis

Open Access
|May 2020

Figures & Tables

tismir-3-1-45-g1.png
Figure 1

J.S.Bach, Prelude in C, BWV846: Measures 1–11 of the score with an RN analysis given in the text below the lowest stave.

Table 1

The contents of our meta-corpus, drawing together existing harmonic analysis datasets. The relative size of each corpus is given by the total, combined number of RNs in the analyses, the number of measures in the scores, and also the ‘Quarter length’: a metric for the total length in quarter notes.

DatasetComposer/sMovements or equivalentQuarter lengthMeasuresRNs
TAVERNMozart10 theme and variations sets7 7122 7738 779
Beethoven17 theme and variations sets12 8405 12815 959
ABCBeethoven16 string quartets, 70 movements48 81115 88129 652
BPS-FHBeethoven32 piano sonata first movements30 9929 42011 337
Roman TextBach24 preludes3 1688192 165
Various (19th C.)48 romantic songs8 3262 7915 283
Totals201 scores111 85936 81273 175
tismir-3-1-45-g2.png
Figure 2

Measures 22–24 of the same Bach prelude of Figure 1.

tismir-3-1-45-g3.png
Figure 3

Harmonic ambiguity in ‘Einsamkeit’ from Schubert’s Winterreise (D.911, No.12). The three parallel analyses represent A1, A2, and A3 respectively from top to bottom.

Table 2

Different interpretations of measures 34 and 35 of Schubert’s ‘Einsamkeit’ (see Figure 3). The analyses are written in .rntxt format (Tymoczko et al., 2019), as explained in Section 3.1. The ‘rules’ in the second and third column are set out at the beginning of Section 2.

RNRules followed/broken
A1m34 b: irules 1 and 4
m35 irule 3
A2m34 b: i b1.5 Ger42rules 3 and 2
m35 Ger42rule 1
A3m34 b: irules 1 and 4
m35 G: Irule 3
Table 3

The RN and tabular representations used corresponding to the Bach extract in Figure 1. The first column sets out RNs in Tymoczko et al. (2019)‘s ‘Roman text’ format, and the remaining columns unpack that information according to our adaptation of Chen and Su (2018)‘s tabular standard.

RNTXTStartEndKeyDegreeQualityInv.
m1 C: I0.04.0C1M0
m2 ii424.08.0C2m73
m3 V658.012.0C5D71
m4 I12.016.0C1M0
m5 vi616.020.0C6m1
m6 G: V4220.024.0G5D73
m7 I624.028.0G1M1
m8 IV4228.032.0G4M73
m9 ii732.036.0G2m70
m10 V736.040.0G5D70
m11 I40.044.0G1M0
Table 4

Total dimension of input vector for each pitch encoding option (limited to 7 octaves and double sharps/flats).

Chromatic pitch, full (CPf)
7 × 12 = 84
Pitch spelling, full (PSf)
7 × 35 = 245
CP class + bass (CPb)
12 + 12 = 24
PS class + bass (PSb)
35 + 35 = 70
CP class (CPc)
12
PS class (PSc)
35
tismir-3-1-45-g4.png
Figure 4

The distribution of work transpositions that remain within the set limits of F♭♭–Btismir-3-1-45-g7.pngtismir-3-1-45-g7.png for pitches and C♭– Ctismir-3-1-45-g7.png for keys.

tismir-3-1-45-g5.png
Figure 5

Architecture of the neural network model in the ‘local’ training mode. When ‘global’, Quality/Inversion/Root outputs are computed after the fully connected layer instead. The numbers in the boxes refer to the number of categories for each output label in the PSb case (see Table 4).

Table 5

Comparison of the percent accuracy between models. The two rows above the internal division report on our best model – ConvGRU with pitch spelling and bass (PSb) and with global training. The first row reports on training with all available data; the second reduces the available data to the smaller corpus used by Chen and Su (2018). Rows below the internal dividing line provide comparison data for the performance of Chen and Su (2018, 2019), as well as a baseline key detection using pitch profiles by Temperley (1999). ‘Degree’ registers as correct only when the predictions match the corpus entry for both Degrees 1 and 2; ‘RN’ is correct only when all four of the previous columns match in that way.

KeyDegreeQualityInversionRN
ConvGRU + PSb + global (all data)82.968.376.672.042.8
ConvGRU + PSb + global80.666.576.368.139.1
Chen and Su (2019)78.465.174.662.1
Chen and Su (2018)66.751.860.659.125.7
Local model after Temperley (1999)67.0
Table 6

Results obtained by averaging the accuracy of several models on four different axes: architecture, input registral information, input spelling, and global/local training. Column labels are the same as for Table 5, and the first row likewise relates once again to the best performing model. Each sub-table thereafter shows the average performance of several models. For example, the ConvGRU row shows the average of 12 models with the same architecture but using different input representations and registral information. The values in the first row of each sub-table represent the percentage accuracy of the corresponding averaged models as a reference; each line thereafter shows the +/– difference in accuracy from the reference. There are only 6 PoolGRU models, as they can be trained only globally (not locally).

KeyDegreeQualityInversionRN
ConvGRU + PSb + global82.968.376.672.042.8
ConvGRU1281.967.474.667.937.8
ConvDil122.41.80.80.51.7
PoolGRU62.33.01.61.84.1
bass1080.866.674.370.139.2
full100.70.90.63.53.7
class100.10.70.14.74.7
spelling1580.666.274.167.636.5
chromatic150.30.30.20.50.4
global1580.666.875.466.736.9
local15+0.30.72.4+2.0+0.2
Table 7

A comparison between the corpus analysis (left, reproducing Table 3) and our system’s output (right). Discrepancies between the input and output analyses are highlighted in italics.

RNCorpusOutput
StartEndKeyDegreeQualityInv.StartEndKeyDegreeQualityInv.
m1 C: I0.04.0C1M00.04.0C1M0
m2 ii424.08.0C2m734.04.5C2m70
4.57.0C2m71
7.07.5C2D70
7.58.0C5D70
m3 V658.012.0C5D718.08.5C5D71
8.59.5C5M1
9.510.0C5D71
10.011.0C5M1
11.012.0C5D71
m4 I12.016.0C1M012.016.0C1M0
m5 vi616.020.0C6m116.016.5C1m0
16.517.0C6m0
17.020.0C6m1
tismir-3-1-45-g6.png
Figure 6

Beethoven’s piano sonata no.6, m.40–43.

DOI: https://doi.org/10.5334/tismir.45 | Journal eISSN: 2514-3298
Language: English
Submitted on: Dec 4, 2019
Accepted on: Mar 23, 2020
Published on: May 12, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Gianluca Micchi, Mark Gotham, Mathieu Giraud, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.