Piano Sheet Music Identification Using Dynamic N-gram Fingerprinting

Daniel Yang; T. J. Tsai

doi:10.5334/tismir.70

Figures & Tables

Proposed architecture for large-scale piano sheet music retrieval.

An excerpt of sheet music and its corresponding bootleg score.

An example of constructing a dynamic N-gram. The length of the N-gram is chosen to balance runtime and retrieval accuracy.

Table 1

Comparing system performance on the camera-based piano sheet music identification task. The first three columns show the mean reciprocal rank on three retrieval tasks at varying levels of specificity. The last two columns show the system runtimes. The 1-gram system was proposed by Tsai (2020) and corresponds to the previous state-of-the-art.

	Retrieval (MRR)			Runtime (s)
System	Piece	Version	Page	Avg	Sth
MAC	.037	.023	.026	1.17	.12
SPoC	.003	.002	.002	1.14	.09
GeM	.025	.017	.017	1.18	.11
R-MAC	.036	.023	.024	0.96	.11
1-gram	.709	.560	.620	21.5	12.5
2-gram	.845	.525	.775	2.76	.36
3-gram	.808	.652	.723	1.99	.21
4-gram	.755	.617	.665	1.23	.13
5-gram	.608	.493	.599	1.07	.08
dynamic	.853	.692	.785	0.98	.12

Frequency of N-gram fingerprints in the database, where fingerprints have been sorted from most frequent (left) to least frequent (right). Note that both axes are on a log scale.

Effect of database size on system performance for the sheet music identification task. The largest database size corresponds to the full IMSLP piano database.

Table 2

Comparing performance on the sheet music identification task under two conditions: when the exact same printed edition exists in the database (left column) and when only an alternate edition of the same piece exists (right column).

	Piece Retrieval (MRR)
System	Same Version	Alternate Version
MAC	.037	.043
SPoC	.003	.004
GeM	.025	.029
R-MAC	.036	.039
1-gram	.709	.659
2-gram	.845	.784
3-gram	.808	.767
4-gram	.755	.722
5-gram	.688	.668
dynamic	.853	.812

Table 3

Comparison of system performance on MIDI-sheet image retrieval. Columns indicate the database size (DB), precision (P) and recall (R) and F-measure (F) for passage-level retrieval, mean reciprocal rank (MRR) for file-level retrieval, and average runtime.

System	DB	P	R	F	MRR	T_avg
Random	1	.16	.16	.16	–	0.0s
SharpEye	1	.43	.08	.13	–	–
PhotoScore	1	.64	.62	.63	–	–
RetinaNet	1	.52	.26	.35	–	11.7s
Dorfer 2018c	1	.69	.28	.40	–	17.5s
Faster R-CNN	1	.84	.87	.85	–	49.9s
DWD	1	.91	.87	.89	–	213s
BS-DTW	1	.89	.89	.89	–	0.90s
BS-DTW	200	.90	.87	.88	.92	10.9s
1-gram	200	.58	.59	.59	.69	1.28s
2-gram	200	.70	.73	.71	.84	1.00s
3-gram	200	.60	.72	.66	.79	1.00s
4-gram	200	.41	.57	.48	.63	1.00s
5-gram	200	.36	.49	.42	.53	1.00s
dynamic	200	.74	.77	.75	.86	0.98s
BS-DTW	2k	.89	.83	.86	.88	167s
1-gram	2k	.48	.51	.49	.59	2.28s
2-gram	2k	.63	.69	.66	.78	1.10s
3-gram	2k	.49	.63	.55	.69	1.09s
4-gram	2k	.39	.55	.46	.60	1.04s
5-gram	2k	.35	.47	.40	.51	1.05s
dynamic	2k	.66	.72	.69	.80	.98s
BS-DTW	10k	.83	.77	.80	.83	458s
1-gram	10k	.28	.34	.31	.43	6.78s
2-gram	10k	.43	.58	.49	.68	1.45s
3-gram	10k	.40	.53	.46	.56	1.10s
4-gram	10k	.34	.48	.40	.50	1.04s
5-gram	10k	.28	.42	.34	.46	1.00s
dynamic	10k	.55	.66	.60	.73	.99s

Comparing different strategies for handling repeats and jumps in MIDI-sheet image retrieval. The MIDI query is broken up into fragments of length L, either through deterministic shingling or a fixed number (K) of randomly sampled windows.

Table 4

Summary of verified matches between the Lakh MIDI dataset and IMSLP.

Composer	# Lakh MIDI	# IMSLP Pieces
Bach	356	94
Chopin	291	49
Beethoven	186	50
Mozart	116	30
Liszt	77	32
Debussy	48	9
Brahms	41	10
Other	318	75
Total	1433	349

Piano Sheet Music Identification Using Dynamic N-gram Fingerprinting

Figures & Tables

Figure 1

Figure 2

Figure 3

Table 1

Figure 4

Figure 5

Table 2

Table 3

Figure 6

Table 4

Paradigm

My account