Have a personal or library account? Click to login
Automatic Transcription of Organ Tablature Music Notation with Deep Neural Networks Cover

Automatic Transcription of Organ Tablature Music Notation with Deep Neural Networks

Open Access
|Feb 2021

Figures & Tables

tismir-4-1-77-g1.png
Figure 1

Transcription of a tablature row into modern music notation. The transcribed row consists of four tablature staves that are converted into a four-part score in modern notation.

tismir-4-1-77-g2.jpg
Figure 2

Deviations in transcriptions of Ammerbach’s “Orgel oder Instrument Tabulaturbuch” by (1) Becker (1963) and (2) Müller-Schmidt (2017).

tismir-4-1-77-g3.jpg
Figure 3

Examples of the different types of organ tablature symbols taken from Ammerbach (1583): (1) Note duration symbols; (2) Note pitch symbols; (3) Pause signs; (4) Special characters.

tismir-4-1-77-g4.png
Figure 4

The layout of printed tablature characters using the example of Ammerbach (1583). Each staff (S1, S2) inside the row consists of two lines (duration/special (d/s) and pitch/rest (p/r) line) in which the tablature characters are arranged.

tismir-4-1-77-g5.png
Figure 5

Segmentation of the input image. The input image is split into separate images for each row on the displayed horizontal dividing lines. Afterwards, the results are split into images for each staff using the estimated voice positions.

tismir-4-1-77-g6.png
Figure 6

Architecture of the CSP tablature recognition network. The different layer types are color-coded and the size is indicated below each layer. This image was generated with PlotNeuralNet (Iqbal, 2018).

tismir-4-1-77-g7.png
Figure 7

Comparison of (1) a real tablature row from Ammerbach’s tablature books; (2) an augmented version of the same tablature row; (3,4) two tablature rows artificially created by the data generator.

Table 1

The data set, consisting of training, evaluation, and test sets. The values indicate the number of images taken from each source. The numbers in parentheses indicate the factor by which this number has been enlarged by data augmentation.

SubsetStaves taken from “Orgel oder Instrument Tabulaturbuch”Staves taken from “Ein new künstlich Tabulaturbuch”Generated StavesSum
trainSetL500 (*100)500 (*100)140,000 (*5)800,000
trainSetS500 (*100)500 (*100)20,000 (*5)200,000
valSet200 (*25)200 (*25)8,000 (*5)50,000
testSet50050001,000
Table 2

Comparison of the CSP network with two partial networks, one for each path. The table shows the metrics evaluated on the test data set, the number of floating point operations (FLOPs), and the training time required for 40 epochs.

NetworkCharactersAccuracyEdit DistanceComputation Time
Top-10Top-5Top-1BarStaffCharFLOPsTraining
CSPDuration/Special0.9940.9920.9700.9890.0700.0016227,925,68847h34m
Pitch/Rest0.9440.9430.8760.9710.3200.00515
Partial 1Duration/Special0.9910.9900.9630.9880.0860.0019218,377,88428h12m
Partial 2Pitch/Rest0.9630.9600.8960.9720.2500.0042318,453,66032h33m
Table 3

Comparison of the CSP network with three simpler variations of the network. The table shows the differences in the network configurations, the metrics evaluated on the test set, as well as the number of FLOPs for each network.

NetworkConfigurationCharactersAccuracyEdit DistanceFLOPs
CNNsGRUsTop-1BarStaffChar
Full CSP2 blocks2 bidir.Duration/Special0.9700.9890.0700.0016227,925,688
2 bidir.Pitch/Rest0.8760.9710.3200.00515
1 CNN1 block2 bidir.Duration/Special0.9540.9890.0950.0021827,483,320
2 bidir.Pitch/Rest0.8400.9580.5010.00835
unidir. GRUs2 blocks2 unidir.Duration/Special0.9620.9880.0790.0019018,415,780
2 unidir.Pitch/Rest0.8350.9600.4120.00690
1 GRU2 blocks1 bidir.Duration/Special0.9500.9840.0980.0020721,634,212
1 bidir.Pitch/Rest0.8180.9490.5160.00859
Table 4

Comparison of training the CSP network with different data sets. The table shows the differences in the number of real and generated tablature images (as well as the factor by which this amount was increased by augmentation) for each training set and the metrics evaluated on the test data.

Train DataNumber of Tablature StavesCharactersAccuracyEdit Distance
RealGeneratedSumTop-1BarStaffChar
noAug100001000Duration/Special0.8440.9520.3480.00746
Pitch/Rest0.6410.8951.2700.02258
Aug1000 (*100)0100,000Duration/Special0.9660.9950.0630.00134
Pitch/Rest0.8510.9670.3940.00645
Aug1Gen11000 (*100)20,000 (*5)200,000Duration/Special0.9700.9930.0550.00120
Pitch/Rest0.8750.9720.2860.00455
Aug1Gen21000 (*100)40,000 (*5)300,000Duration/Special0.9670.9900.0680.00143
Pitch/Rest0.8730.9700.3070.00510
Aug2Gen21000 (*200)40,000 (*5)400,000Duration/Special0.9710.9900.0590.00128
Pitch/Rest0.8700.9720.2920.00475
PtAug100 (*100)010,000Duration/Special0.7250.9190.7820.01608
Pitch/Rest0.5270.8492.0050.03460
PtAugGen100 (*100)50,000 (*5)260,000Duration/Special0.8710.9520.2660.00583
Pitch/Rest0.7090.9060.8640.01539
Table 5

Evaluation of the CSP network trained on the trainSetS data set. The table shows the metrics calculated on the test data set.

CharactersAccuracyEdit Distance
Top-10Top-5Top-1BarStaffChar
Duration/Special0.9960.9960.9700.9930.0550.00120
Pitch/Rest0.9510.9470.8750.9720.2860.00455
Table 6

The errors that occurred during the analysis, categorized into groups with the corresponding number of cases.

Duration/Special ErrorsPitch/Rest Errors
CategoryCountCategoryCount
Missed Symbol21Missed Symbol47
Added Symbol4Added Symbol17
Wrong Symbol10Wrong Symbol74
Wrong Octave85
tismir-4-1-77-g8.png
Figure 8

Examples of areas in the test data that are difficult to recognize. In images 1–6, an analysis error occurred in the marked areas, while in images 7–9, even in the circled areas, the poorly readable characters were correctly recognized.

DOI: https://doi.org/10.5334/tismir.77 | Journal eISSN: 2514-3298
Language: English
Submitted on: Sep 23, 2020
Accepted on: Dec 28, 2020
Published on: Feb 24, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Daniel Schneider, Nikolaus Korfhage, Markus Mühling, Peter Lüttig, Bernd Freisleben, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.