Table 1
Comparison between publicly available datasets.
| Name/Author | Mixed music | Classes per inst. | Loudness | # instances | Duration (h) |
|---|---|---|---|---|---|
| Scheirer | Yes*, annotated | Single-class | No | 245 | 1 |
| Seyerlehner | Yes, not annotated | Multi-class | No | 13 | 9 |
| GTZAN | No | Single-class | No | 128 | 1.1 |
| MUSAN | No | Single-class | No | 2016 | 108.9 |
| OpenBMAT | Yes, annotated | Multi-class | Yes | 1647 | 27.4 |
[i] * Only with speech.

Figure 1
Distribution of audio files by program type and country. The program types are: children (C), documentary (D), entertainment (E), music (M), news (N), series & films (S&F), sports (S) and talk (T).

Figure 2
Screenshot of BAT, the annotation tool used for the annotation of OpenBMAT.

Figure 3
(Left) MD mapping: mapping to compute the agreement for the music detection task. (Right) RMLE mapping: mapping that includes information about the relative loudness of music.
Table 2
Percentages of full, partial and pair-wise (PW) agreement (Agr) for the whole dataset. These values have been computed for the complete taxonomy and both mappings.
| Agreement level | No mapping Agr (%) | MD mapping Agr (%) | RMLE mapping Agr (%) |
|---|---|---|---|
| %FA | 68.18 | 94.78 | 89.1 |
| %PA | 96.75 | 100 | 99.79 |
| %PW (annotators 1 & 2) | 77.46 | 96.22 | 91.7 |
| %PW (annotators 2 & 3) | 76.97 | 96.78 | 92.78 |
| %PW (annotators 1 & 3) | 78.66 | 96.55 | 93.52 |

Figure 4
Percentage of the content of OpenBMAT by class and agreement level.

Figure 5
Percentage of audio files accumulated over a certain %FAaf value using the RMLE mapping.

Figure 6
(Rows) Class annotated by 2 annotators. (Columns) Class annotated by the third annotator. (Values) Percentage of the content with full or partial agreement for each class divided by the classification of the third annotator.
Table 3
Columns 2 to 4: percentage of all the audio annotated by each annotator as each of the classes of the RMLE mapping. Columns 5 and 6: percentage of all the audio annotated by each annotator as Music or No Music (isolated) or as any of the other 4 classes (mixed).
| Annotator | Fg. Music (%) | Bg. Music (%) | No Music (%) | Isolated (%) | Mixed (%) |
|---|---|---|---|---|---|
| Annotator 1 | 16.6 | 34.45 | 48.94 | 60.09 | 39.91 |
| Annotator 2 | 12.7 | 37.28 | 50.02 | 57.84 | 42.16 |
| Annotator 3 | 15 | 34.66 | 50.34 | 59.28 | 40.72 |
Table 4
Performance of MMG on the OpenBMAT dataset using the MD and RMLE mappings. We report overall accuracy (Acc), and Precision (P) and Recall (R) for each mapped class. In this table, Music stands both for Music, in the case of MD mapping, and Foreground Music, in the case of RMLE mapping.
| Mapping | Acc. | Music P | Music R | Bg. Music P | Bg. Music R | No Music P | No Music R |
|---|---|---|---|---|---|---|---|
| MD | 88.95 | 91.99 | 85.45 | – | – | 86.29 | 92.48 |
| RMLE | 82.71 | 77.64 | 69.96 | 78.51 | 76.09 | 86.8 | 91.33 |

Figure 7
Audio file distribution by full agreement using the RMLE mapping and the accuracy achieved by MMG when evaluated against the annotations of one of the annotators.
