
Figure 1
A generalized audio fingerprinter scheme. Audio is fed into the system, features are extracted and fingerprints constructed. The fingerprints are consecutively compared with a database containing the fingerprints of the reference audio. The original audio is either identified or, if no match is found, labeled as unknown.

Figure 2
(a) Fingerprint block of original music clip, (b) fingerprint block of a compressed version, (c) the difference between a and b showing the bit errors in black. The hamming distance or the number of bit errors is indicated in red.

Figure 3
Bit errors per fingerprint for the 128 kb/s CBR encoded MP3 and the GSM encoded version of the same three seconds of audio. Both are compared to the original uncompressed version. The average and standard deviation are indicated.
Table 1
Tracks bought from 7 digital music store with 7 digital identifier and format information. The ISRC (International Standard Recording Code) and AcoustID fingerprint (https://acoustid.org) are provided as well.
| Identifier | ISRC | AcoustID | Track | Format |
|---|---|---|---|---|
| 56984036 | 3af00f3a-afc8-4b62-8eff-dacb7d7245c9 | Sinead | 320 kbs MP3 | |
| 52740482 | b03406c9-1b14-427e-b4fa-16029b8a72cc | ACDC | 16-bit/44.1kHz FLAC | |
| 122965 | GBF089607481 | 92f4e392-a36e-47c8-bfee-b553b0c0e0ad | Texas | 320 kbs MP3 |
| 5917942 | DEF056730100 | 99eb4952-9a72-4811-9b1e-f8c8ab737e9f | Orff | 16-bit/44.1kHz FLAC |
Table 2
Replication of bit error rates (BER) for different kinds of signal degradations. The original results and replicated results are reported.
| Modification | Texas | Sinead | Orff | AC/DC | ||||
|---|---|---|---|---|---|---|---|---|
| Original | Replication | Original | Replication | Original | Replication | Original | Replication | |
| MP3@128Kbps | 0.081 | 0.055 | 0.085 | 0.077 | 0.078 | 0.056 | 0.084 | 0.035 |
| MP3@32Kbps | 0.096 | 0.097 | 0.106 | 0.115 | 0.174 | 0.100 | 0.133 | 0.089 |
| Real@20Kbps | 0.159 | / | 0.138 | / | 0.161 | / | 0.210 | / |
| GSM | 0.168 | 0.194 | 0.144 | 0.211 | 0.160 | 0.217 | 0.181 | 0.187 |
| GSM C/I = 4dB | 0.316 | / | 0.247 | / | 0.286 | / | 0.324 | / |
| All-pass filtering | 0.018 | 0.020 | 0.015 | 0.032 | 0.019 | 0.033 | 0.027 | 0.010 |
| Amp. Compr. | 0.113 | 0.010 | 0.070 | 0.027 | 0.052 | 0.033 | 0.073 | 0.014 |
| Equalization | 0.066 | 0.025 | 0.045 | 0.024 | 0.048 | 0.023 | 0.062 | 0.013 |
| Echo Addition | 0.139 | 0.132 | 0.148 | 0.145 | 0.157 | 0.118 | 0.145 | 0.109 |
| Band Pass Filter | 0.024 | 0.031 | 0.025 | 0.034 | 0.028 | 0.030 | 0.038 | 0.017 |
| Time Scale +4% | 0.200 | 0.279 | 0.183 | 0.283 | 0.202 | 0.302 | 0.206 | 0.301 |
| Time Scale –4% | 0.190 | 0.263 | 0.174 | 0.277 | 0.207 | 0.281 | 0.203 | 0.294 |
| Linear Speed +1% | 0.132 | 0.189 | 0.102 | 0.193 | 0.172 | 0.214 | 0.238 | 0.181 |
| Linear Speed –1% | 0.260 | 0.177 | 0.142 | 0.199 | 0.243 | 0.201 | 0.196 | 0.177 |
| Linear Speed +4% | 0.355 | 0.434 | 0.467 | 0.461 | 0.438 | 0.551 | 0.472 | 0.470 |
| Linear Speed –4% | 0.470 | 0.425 | 0.438 | 0.500 | 0.464 | 0.510 | 0.431 | 0.464 |
| Noise Addition | 0.011 | 0.042 | 0.011 | 0.122 | 0.009 | 0.273 | 0.036 | 0.027 |
| Resampling | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.004 | 0.000 | 0.000 |
| D/A A/D | 0.111 | / | 0.061 | / | 0.088 | / | 0.076 | / |
Table 3
Replication of hits in the database for different kinds of signal degradations. First number indicates the hits for using only the 256 sub-fingerprints to generate candidate positions. Second number indicates hits when 1024 most probable candidates for every sub-fingerprint are also used.
| Modification | Orff | Sinead | Texas | AC/DC | ||||
|---|---|---|---|---|---|---|---|---|
| Original | Replication | Original | Replication | Original | Replication | Original | Replication | |
| MP3@128Kbps | 17, 170 | 150, 226 | 20, 196 | 59, 111 | 23, 182 | 94, 166 | 19, 144 | 144, 207 |
| MP3@32Kbps | 0, 34 | 44, 123 | 10, 153 | 14, 63 | 13, 148 | 20, 56 | 5, 61 | 29, 87 |
| Real@20Kbps | 2, 7 | / | 7, 110 | / | 2, 67 | / | 1, 41 | / |
| GSM | 1, 57 | 2, 6 | 2, 95 | 0, 1 | 1, 60 | 0, 5 | 0, 31 | 4, 16 |
| GSM C/I = 4dB | 0, 3 | / | 0, 12 | / | 0, 1 | / | 0, 3 | / |
| All-pass filtering | 157, 240 | 170, 244 | 158, 256 | 161, 226 | 146, 256 | 166, 251 | 106, 219 | 191, 245 |
| Amp. Compr. | 55, 191 | 145, 222 | 59, 183 | 98, 156 | 16, 73 | 169, 247 | 44, 146 | 183, 241 |
| Equalization | 55, 203 | 161, 236 | 71, 227 | 220, 126 | 34, 172 | 126, 193 | 42, 148 | 171, 227 |
| Echo Addition | 2, 36 | 53, 70 | 12, 69 | 37, 73 | 15, 69 | 68, 112 | 4, 52 | 73, 102 |
| Band Pass Filter | 123, 225 | 169, 237 | 118, 253 | 149, 193 | 117, 255 | 110, 186 | 80, 214 | 159, 241 |
| Time Scale +4% | 6, 55 | 43, 72 | 7, 68 | 53, 54 | 16, 70 | 57, 123 | 6, 36 | 66, 118 |
| Time Scale –4% | 17, 60 | 57, 107 | 22, 77 | 53, 57 | 23, 62 | 54, 118 | 16, 44 | 60, 108 |
| Linear Speed +1% | 3, 29 | 2, 6 | 18, 170 | 2, 16 | 3, 82 | 3, 22 | 1, 16 | 8, 35 |
| Linear Speed –1% | 0, 7 | 0, 8 | 5, 88 | 2, 16 | 0, 7 | 1, 22 | 0, 8 | 4, 16 |
| Linear Speed +4% | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 1 | 0, 0 |
| Linear Speed –4% | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 0 | 0, 0 |
| Noise Addition | 190, 256 | 30, 73 | 178, 255 | 0, 9 | 179, 256 | 23, 101 | 114, 255 | 99, 167 |
| Resampling | 255, 256 | 253, 256 | 255, 256 | 239, 256 | 254, 256 | 254, 256 | 254, 256 | 253, 256 |
| D/A A/D | 15, 149 | / | 38, 229 | / | 13, 114 | / | 31, 145 | / |
Table 4
Results on a dataset of 10k songs with 1000 queries per modification. The average Hamming distance between a modified fingerprint of 32 bits and the matching reference is reported ± one standard deviation.
| TP | TN | FP | FN | Sensitivity | Specificity | Accuracy | Precision | Avg. dist. (bits) | |
|---|---|---|---|---|---|---|---|---|---|
| MP3@128Kbps | 90.53% | 9.18% | 0.09% | 0.19% | 99.79% | 98.99% | 99.72% | 99.90% | 4.72 ± 1.64 |
| MP3@32Kbps | 89.97% | 9.18% | 0.19% | 0.66% | 99.28% | 98.00% | 99.16% | 99.79% | 5.73 ± 1.72 |
| All-pass filtering | 90.35% | 9.18% | 0.00% | 0.47% | 99.48% | 100.00% | 99.53% | 100.00% | 5.09 ± 1.72 |
| Amp. Compr. | 90.44% | 9.18% | 0.00% | 0.37% | 99.59% | 100.00% | 99.63% | 100.00% | 5.26 ± 1.79 |
| Band Pass Filter | 90.63% | 9.18% | 0.09% | 0.09% | 99.90% | 98.99% | 99.81% | 99.90% | 5.13 ± 1.75 |
| Echo Addition | 86.14% | 9.27% | 0.19% | 4.40% | 95.14% | 98.02% | 95.41% | 99.78% | 7.12 ± 1.53 |
| Equalization | 90.63% | 9.18% | 0.00% | 0.19% | 99.79% | 100.00% | 99.81% | 100.00% | 5.25 ± 1.78 |
| GSM | 42.92% | 9.28% | 0.19% | 47.61% | 47.41% | 98.02% | 52.20% | 99.57% | 9.02 ± 1.17 |
| Resampling | 90.43% | 9.19% | 0.09% | 0.28% | 99.69% | 98.99% | 99.62% | 99.90% | 4.95 ± 1.68 |
| Linear Speed –4% | 0.00% | 9.27% | 0.00% | 90.73% | 0.00% | 100.00% | 9.27% | / | / |
| Linear Speed –1% | 75.66% | 9.27% | 0.19% | 14.89% | 83.56% | 98.02% | 84.93% | 99.75% | 8.41 ± 1.46 |
| Linear Speed +1% | 79.40% | 9.27% | 0.28% | 11.05% | 87.78% | 97.06% | 88.67% | 99.65% | 7.65 ± 1.41 |
| Linear Speed +4% | 0.00% | 9.27% | 0.00% | 90.73% | 0.00% | 100.00% | 9.27% | / | / |
| Time Scale –4% | 76.50% | 9.27% | 0.28% | 13.95% | 84.58% | 97.06% | 85.77% | 99.63% | 10.20 ± 0.88 |
| Time Scale +4% | 88.30% | 9.27% | 0.19% | 2.25% | 97.52% | 98.02% | 97.57% | 99.79% | 9.72 ± 1.00 |
| Noise Addition | 87.83% | 9.27% | 0.19% | 2.72% | 97.00% | 98.02% | 97.10% | 99.79% | 5.60 ± 1.98 |
