Have a personal or library account? Click to login
CCMusic: An Open and Diverse Database for Chinese Music Information Retrieval Research Cover

CCMusic: An Open and Diverse Database for Chinese Music Information Retrieval Research

Open Access
|Mar 2025

Figures & Tables

Table 1

Publicly available datasets designed for topics related to Chinese music in music technology to date.

DatasetContentTaskObject
Corpus of Jingju (Caro Repetto and Serra, 2014) (2014)Audio, editorial metadata, lyrics, and scores of JingjuMelodic analysisJingju
SingingDatabase (Black et al., 2014) (2014)Vocal recordings by professional, semi‑professional, and amateur singers in predominantly Chinese opera singing styleSinging voice analysisChinese opera
Unnamed (Hu and Yang, 2017) (2017)Traditional Chinese music pieces in the spectrogram representation with information on performers and instrumentsLatent space analysisChinese traditional music
GQ39 (Huang et al., 2020) (2020)Audio recordings of prevalent Guqin solo compositions and corresponding event‑by‑event annotationsPlaying technique detection, mode detectionGuqin
Unnamed (Nahar et al., 2020) (2020)Musical features of Chinese, Malay, and Indian song fragmentsMusic classificationChinese songs
JinYue Database (Shen et al., 2020) (2020)Huqin music metadata, audio features, and annotations of emotion, scene, and imageryEmotion, scene, and imagery recognitionHuqin
ChMusic (Gong et al., 2021) (2021)Traditional Chinese music excerpts performed by 11 traditional Chinese musical instrumentsInstrument recognitionChinese instruments
Traditional Chinese Opera (Zhang et al., 2021) (2021)Songs from the 14 most popular types of Chinese opera with annotations of music, song, and speechGenre recognitionChinese opera
CBFdataset (Wang et al., 2022a) (2022)Monophonic recordings of classic Chinese bamboo flute pieces and isolated playing techniques with annotationsPlaying technique detectionZhudi
CCOM‑HuQin (Zhang et al., 2023) (2023)Audiovisual recordings of 11,992 individual playing technique clips and 57 annotated musical compositions featuring classical excerptsPlaying technique detectionHuqin
Table 2

All the datasets included in the CCMusic database (references following the name of each published dataset).

DatasetMain TaskMain Contents
Chinese Traditional Instrument Sound (Liang et al., 2019)Instrument recognitionAudio of Chinese instruments
GZ IsoTech (Li et al., 2022)Playing technique classificationAudio with playing technique annotation
Guzheng Tech99 (Li et al., 2023)Playing technique detectionAudio with playing technique annotation
Erhu Playing Technique (Wang et al., 2019)Playing technique classificationAudio with playing technique annotation
Chinese National Pentatonic Modes (Wang et al., 2022b)Mode classificationAudio with Chinese pentatonic mode annotation
Bel Canto & Chinese Folk Singing (authors’ own creation)Singing style classificationAudio with singing style annotation
tismir-8-1-194-g1.png
Figure 1

Number of audio clips across various durations in the CTIS dataset, segmented at 27.5 seconds.

tismir-8-1-194-g2.png
Figure 2

Number of instrument categories across various durations in the CTIS dataset, segmented at 437 seconds.

tismir-8-1-194-g3.png
Figure 3

Clip number and proportion of each category in the GZ IsoTech dataset.

tismir-8-1-194-g4.png
Figure 4

Audio duration of each category in the GZ IsoTech dataset.

tismir-8-1-194-g5.png
Figure 5

Number of audio clips across various durations in the GZ IsoTech dataset, segmented at 1.5 seconds.

tismir-8-1-194-g6.png
Figure 6

Clip number and proportion of each category in the Guzheng Tech99 dataset.

tismir-8-1-194-g7.png
Figure 7

Audio duration of each category in the Guzheng Tech99 dataset.

tismir-8-1-194-g8.png
Figure 8

Number of audio clips across various durations in the Guzheng Tech99 dataset, segmented at 0.485 seconds.

tismir-8-1-194-g9.png
Figure 9

Clip number and proportion of each category in the ErhuPT dataset.

tismir-8-1-194-g10.png
Figure 10

Audio duration of each category in the ErhuPT dataset.

tismir-8-1-194-g11.png
Figure 11

Number of audio clips across various durations in the ErhuPT dataset, segmented at 550 milliseconds.

tismir-8-1-194-g12.png
Figure 12

Clip number and proportion of each category in the CNPM dataset.

tismir-8-1-194-g13.png
Figure 13

Audio duration of each category in the CNPM dataset.

tismir-8-1-194-g14.png
Figure 14

Number of audio clips across various durations in the CNPM dataset, segmented at 85 seconds.

tismir-8-1-194-g15.png
Figure 15

The clip number and proportion of each category in the Bel Canto & Chinese Folk Singing dataset.

tismir-8-1-194-g16.png
Figure 16

Audio duration of each category in the Bel Canto & Chinese Folk Singing dataset.

tismir-8-1-194-g17.png
Figure 17

Number of audio clips across various durations in the Bel Canto & Chinese Folk Singing dataset, segmented at 33 seconds.

tismir-8-1-194-g18.png
Figure 18

The evaluation framework that supports classification and detection tasks.

Table 3

F1‑scores of seven models on the CTIS dataset. The best results of the transformer group and the CNN group are indicated in bold.

BackboneMelCQTChroma
ViT‑L‑320.9360.9210.845
Swin‑T0.9560.9400.759
RegNet‑Y‑32GF0.9730.9800.848
VGG19‑BN0.9660.9650.852
AlexNet0.9360.9210.661
ResNet1010.9530.9490.782
Inception‑V30.8600.8550.664
Average0.9400.9330.773
Table 4

F1‑scores of seven models on the GZ IsoTech dataset. The best scores of the transformer group and the CNN group are indicated in bold.

BackboneMelCQTChroma
ViT‑L‑160.8550.8240.770
MaxVit‑T0.7630.7760.642
ResNeXt101‑64X4D0.7130.7650.639
ResNet1010.7310.7980.719
RegNet‑Y‑8GF0.8040.8070.716
ShuffleNet‑V2‑X2.00.7020.7990.665
MobileNet‑V3‑Large0.8060.7980.657
Average0.7680.7950.687
Table 5

F1‑scores of seven models on the Guzheng Tech99 dataset. The best scores of the transformer group and the CNN group are indicated in bold.

BackboneMelCQTChroma
ViT‑B‑160.7050.5180.508
Swin‑T0.8490.7830.766
VGG190.8620.7990.665
EfficientNet‑V2‑L0.7830.8120.697
ConvNeXt‑B0.8490.8490.805
ResNet1010.6380.8300.707
SqueezeNet1.10.8310.8140.780
Average0.7880.7720.704
Table 6

F1‑scores of seven models on the ErhuPT dataset. The best scores of the transformer group and the CNN group are indicated in bold.

BackboneMelCQTChroma
Swin‑S0.9780.9400.903
Swin‑T0.9940.9580.957
AlexNet0.9600.9700.933
ConvNeXt‑T0.9940.9930.954
ShuffleNet‑V2‑X2.00.9900.9230.887
GoogleNet0.9860.9810.908
SqueezeNet1.10.9320.9390.875
Average0.9760.9580.917
Table 7

F1‑scores of seven models on CNPM dataset. The best scores of the transformer group and the CNN group are indicated in bold.

BackboneMelCQTChroma
ViT‑L‑320.6800.7690.399
ViT‑L‑160.8230.8590.549
VGG11‑BN0.8070.8430.609
RegNet‑Y‑16GF0.5900.8320.535
Wide‑ResNet50‑20.6940.7570.531
AlexNet0.7420.7440.542
ShuffleNet‑V2‑X2.00.4730.7200.266
Average0.6870.7890.490
Table 8

F1‑scores of seven models on the Bel Canto & Chinese Folk Singing dataset. The best scores of the transformer group and the CNN group are indicated in bold.

BackboneMelCQTChroma
Swin‑S0.9280.9360.787
Swin‑T0.9060.8630.731
AlexNet0.9190.9200.746
ConvNeXt‑T0.8950.9250.714
GoogleNet0.9480.9210.739
MNASNet1.30.9310.9310.765
SqueezeNet1.10.9230.9140.685
Average0.9210.9160.738
DOI: https://doi.org/10.5334/tismir.194 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 27, 2024
Accepted on: Feb 21, 2025
Published on: Mar 24, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li, Baoqiang Han, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.