Have a personal or library account? Click to login
A Case for Reproducibility in MIR: Replication of ‘A Highly Robust Audio Fingerprinting System’ Cover

A Case for Reproducibility in MIR: Replication of ‘A Highly Robust Audio Fingerprinting System’

Open Access
|Sep 2018

Figures & Tables

tismir-1-1-4-g1.png
Figure 1

A generalized audio fingerprinter scheme. Audio is fed into the system, features are extracted and fingerprints constructed. The fingerprints are consecutively compared with a database containing the fingerprints of the reference audio. The original audio is either identified or, if no match is found, labeled as unknown.

tismir-1-1-4-g2.png
Figure 2

(a) Fingerprint block of original music clip, (b) fingerprint block of a compressed version, (c) the difference between a and b showing the bit errors in black. The hamming distance or the number of bit errors is indicated in red.

tismir-1-1-4-g3.png
Figure 3

Bit errors per fingerprint for the 128 kb/s CBR encoded MP3 and the GSM encoded version of the same three seconds of audio. Both are compared to the original uncompressed version. The average and standard deviation are indicated.

Table 1

Tracks bought from 7 digital music store with 7 digital identifier and format information. The ISRC (International Standard Recording Code) and AcoustID fingerprint (https://acoustid.org) are provided as well.

IdentifierISRCAcoustIDTrackFormat
569840363af00f3a-afc8-4b62-8eff-dacb7d7245c9Sinead320 kbs MP3
52740482b03406c9-1b14-427e-b4fa-16029b8a72ccACDC16-bit/44.1kHz FLAC
122965GBF08960748192f4e392-a36e-47c8-bfee-b553b0c0e0adTexas320 kbs MP3
5917942DEF05673010099eb4952-9a72-4811-9b1e-f8c8ab737e9fOrff16-bit/44.1kHz FLAC
Table 2

Replication of bit error rates (BER) for different kinds of signal degradations. The original results and replicated results are reported.

ModificationTexasSineadOrffAC/DC
OriginalReplicationOriginalReplicationOriginalReplicationOriginalReplication
MP3@128Kbps0.0810.0550.0850.0770.0780.0560.0840.035
MP3@32Kbps0.0960.0970.1060.1150.1740.1000.1330.089
Real@20Kbps0.159/0.138/0.161/0.210/
GSM0.1680.1940.1440.2110.1600.2170.1810.187
GSM C/I = 4dB0.316/0.247/0.286/0.324/
All-pass filtering0.0180.0200.0150.0320.0190.0330.0270.010
Amp. Compr.0.1130.0100.0700.0270.0520.0330.0730.014
Equalization0.0660.0250.0450.0240.0480.0230.0620.013
Echo Addition0.1390.1320.1480.1450.1570.1180.1450.109
Band Pass Filter0.0240.0310.0250.0340.0280.0300.0380.017
Time Scale +4%0.2000.2790.1830.2830.2020.3020.2060.301
Time Scale –4%0.1900.2630.1740.2770.2070.2810.2030.294
Linear Speed +1%0.1320.1890.1020.1930.1720.2140.2380.181
Linear Speed –1%0.2600.1770.1420.1990.2430.2010.1960.177
Linear Speed +4%0.3550.4340.4670.4610.4380.5510.4720.470
Linear Speed –4%0.4700.4250.4380.5000.4640.5100.4310.464
Noise Addition0.0110.0420.0110.1220.0090.2730.0360.027
Resampling0.0000.0000.0000.0000.0000.0040.0000.000
D/A A/D0.111/0.061/0.088/0.076/
Table 3

Replication of hits in the database for different kinds of signal degradations. First number indicates the hits for using only the 256 sub-fingerprints to generate candidate positions. Second number indicates hits when 1024 most probable candidates for every sub-fingerprint are also used.

ModificationOrffSineadTexasAC/DC
OriginalReplicationOriginalReplicationOriginalReplicationOriginalReplication
MP3@128Kbps17, 170150, 22620, 19659, 11123, 18294, 16619, 144144, 207
MP3@32Kbps0, 3444, 12310, 15314, 6313, 14820, 565, 6129, 87
Real@20Kbps2, 7/7, 110/2, 67/1, 41/
GSM1, 572, 62, 950, 11, 600, 50, 314, 16
GSM C/I = 4dB0, 3/0, 12/0, 1/0, 3/
All-pass filtering157, 240170, 244158, 256161, 226146, 256166, 251106, 219191, 245
Amp. Compr.55, 191145, 22259, 18398, 15616, 73169, 24744, 146183, 241
Equalization55, 203161, 23671, 227220, 12634, 172126, 19342, 148171, 227
Echo Addition2, 3653, 7012, 6937, 7315, 6968, 1124, 5273, 102
Band Pass Filter123, 225169, 237118, 253149, 193117, 255110, 18680, 214159, 241
Time Scale +4%6, 5543, 727, 6853, 5416, 7057, 1236, 3666, 118
Time Scale –4%17, 6057, 10722, 7753, 5723, 6254, 11816, 4460, 108
Linear Speed +1%3, 292, 618, 1702, 163, 823, 221, 168, 35
Linear Speed –1%0, 70, 85, 882, 160, 71, 220, 84, 16
Linear Speed +4%0, 00, 00, 00, 00, 00, 00, 10, 0
Linear Speed –4%0, 00, 00, 00, 00, 00, 00, 00, 0
Noise Addition190, 25630, 73178, 2550, 9179, 25623, 101114, 25599, 167
Resampling255, 256253, 256255, 256239, 256254, 256254, 256254, 256253, 256
D/A A/D15, 149/38, 229/13, 114/31, 145/
Table 4

Results on a dataset of 10k songs with 1000 queries per modification. The average Hamming distance between a modified fingerprint of 32 bits and the matching reference is reported ± one standard deviation.

TPTNFPFNSensitivitySpecificityAccuracyPrecisionAvg. dist. (bits)
MP3@128Kbps90.53%9.18%0.09%0.19%99.79%98.99%99.72%99.90%4.72 ± 1.64
MP3@32Kbps89.97%9.18%0.19%0.66%99.28%98.00%99.16%99.79%5.73 ± 1.72
All-pass filtering90.35%9.18%0.00%0.47%99.48%100.00%99.53%100.00%5.09 ± 1.72
Amp. Compr.90.44%9.18%0.00%0.37%99.59%100.00%99.63%100.00%5.26 ± 1.79
Band Pass Filter90.63%9.18%0.09%0.09%99.90%98.99%99.81%99.90%5.13 ± 1.75
Echo Addition86.14%9.27%0.19%4.40%95.14%98.02%95.41%99.78%7.12 ± 1.53
Equalization90.63%9.18%0.00%0.19%99.79%100.00%99.81%100.00%5.25 ± 1.78
GSM42.92%9.28%0.19%47.61%47.41%98.02%52.20%99.57%9.02 ± 1.17
Resampling90.43%9.19%0.09%0.28%99.69%98.99%99.62%99.90%4.95 ± 1.68
Linear Speed –4%0.00%9.27%0.00%90.73%0.00%100.00%9.27%//
Linear Speed –1%75.66%9.27%0.19%14.89%83.56%98.02%84.93%99.75%8.41 ± 1.46
Linear Speed +1%79.40%9.27%0.28%11.05%87.78%97.06%88.67%99.65%7.65 ± 1.41
Linear Speed +4%0.00%9.27%0.00%90.73%0.00%100.00%9.27%//
Time Scale –4%76.50%9.27%0.28%13.95%84.58%97.06%85.77%99.63%10.20 ± 0.88
Time Scale +4%88.30%9.27%0.19%2.25%97.52%98.02%97.57%99.79%9.72 ± 1.00
Noise Addition87.83%9.27%0.19%2.72%97.00%98.02%97.10%99.79%5.60 ± 1.98
DOI: https://doi.org/10.5334/tismir.4 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jul 17, 2017
Accepted on: Apr 5, 2018
Published on: Sep 4, 2018
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2018 Joren Six, Federica Bressan, Marc Leman, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.