Have a personal or library account? Click to login
Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets Cover

Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets

By: Shuo Xu,  Yuefu Zhang,  Xin An and  Sainan Pi  
Open Access
|May 2024

Figures & Tables

Figure 1.

An example of the Mechanism of LP Algorithm.
An example of the Mechanism of LP Algorithm.

Figure 2.

The graph model representation of the TextCNN Model.
The graph model representation of the TextCNN Model.

Figure 3.

Power-law distribution of the three datasets.
Power-law distribution of the three datasets.

Performance of the dependency-LDA model with different parameter settings on three real-world datasets_

Dataset#of topicsβCMacro F1Micro F1Hamming loss
Biological-Sciences200.180.04430.01010.2537
500.070.04380.01020.2540
1000.040.04360.00960.2544
Health-Sciences200.470.04120.01000.2542
500.190.04100.01020.2590
1000.090.03860.01020.2603
USPTO500.400.00230.00100.1274
1000.200.00220.00120.1316
2000.100.00220.00120.1343
4000.050.00260.00100.0895

The number of instances in the training and test sets of our datasets_

DatasetsTraining SetTest Set
Health-Sciences16,9324,236
Biological-Sciences9,0342,258
USPTO283,89971,622

Performance of the TextCNN, TextRNN and TextRCNN with different parameter settings on three real-world datasets_

DatasetsTop_kModelMacro F1Micro F1Hamming loss
Health-Sciences7TextCNN0.08830.24890.2304
TextRNN0.07880.23410.1256
TextRCNN0.08360.22940.1359
Biological-Sciences4TextCNN0.30700.46930.5055
TextRNN0.20260.42020.2548
TextRCNN0.31140.50940.4714
USPTO65TextCNN0.03410.20180.0127
TextRNN0.03010.24370.0107
TextRCNN0.04010.24080.0089

Characteristics of our datasets and benchmark datasets_

Dataset#of instances#of labels#of hierarchiesLabel cardinalitydoc/label
Avg.Max.Min.
Health-Sciences21,16850752.2594.101,5711
Biological-Sciences11,29248461.5636.296061
USPTO355,0588,86744.08152.3820,9881
Emotions593611.869184.67264148
Scene2,407611.074430.83533364
Bibtex7,39515912.4011121,04251
Medical9784511.25272661

Several open-source toolkits for solving multi-label classification problems_

Performance of seven multi-label classification methods on three real-world datasets_

DatasetMethodMacro F1Micro F1Hamming Loss
Biological-SciencesDependency LDA0.04430.01020.2537
MLkNN0.08360.15350.3389
RAkEL0.02800.07940.2859
LabelPowerset0.00440.02190.3845
TextCNN0.30700.46930.5055
TextRNN0.20260.42020.2548
TextRCNN0.31140.50940.4714
Health-Sciencesdependency LDA0.04120.01020.2542
MLkNN0.08060.13640.2727
RAkEL0.02940.09280.1761
LabelPowerset0.00660.06200.3113
TextCNN0.08830.24890.2304
TextRNN0.07880.23410.1256
TextRCNN0.08360.22940.1359
USPTOdependency LDA0.00260.00120.0895
MLkNN0.11520.26920.0618
RAkEL0.04230.11020.0587
LabelPowerset0.02740.11610.0594
TextCNN0.03410.20180.0127
TextRNN0.03010.24370.0107
TextRCNN0.04010.24080.0089

Performance of the MLkNN, RAkEL, and LabelPowerset methods with different parameter settings on three real-world datasets_

DatasetMethodMax-featuresMacro F1Micro F1Hamming Loss
Biological-SciencesMLkNN8000.08310.15350.3389
1,0000.08360.15320.3404
RAkEL8000.02800.07940.2859
1,0000.02290.06370.2923
LabelPowerset8000.00440.02190.3845
1,0000.00420.01950.3850
Health-SciencesMLkNN8000.07380.13640.2727
1,0000.08060.12420.2759
RAkEL8000.02840.08580.1761
1,0000.02940.09280.1844
LabelPowerset8000.00620.06100.3115
1,0000.00660.06200.3113
USPTOMLkNN12,0000.11420.26920.0643
18,0000.11520.26730.0618
RAkEL12,0000.04210.10380.0588
18,0000.04230.11020.0587
LabelPowerset12,0000.02730.11610.0594
18,0000.02740.11510.0624

The number of word tokens and unique words in our datasets_

Datasets#of word tokens#of unique words
Health-Sciences1,556,85464,113
Biological-Sciences1,486,84045,610
USPTO2,540,11881,268

Spearman correlation coefficients among Macro F1, Micro F1, and Hamming Loss_

Macro F1Micro F1Hamming loss
Macro F11.0000.7620.277
Micro F10.7621.000-0.008
Hamming loss0.277-0.0081.000
DOI: https://doi.org/10.2478/jdis-2024-0014 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 81 - 103
Submitted on: Nov 5, 2023
Accepted on: Feb 26, 2024
Published on: May 27, 2024
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2024 Shuo Xu, Yuefu Zhang, Xin An, Sainan Pi, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.