Have a personal or library account? Click to login
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling Cover

Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling

Open Access
|Mar 2021

Figures & Tables

Figure 1

An example of character-level sequence labeling.
An example of character-level sequence labeling.

Figure 2

An example of word-level sequence labeling.
An example of word-level sequence labeling.

Figure 3

An example of character-level iob format generation.
An example of character-level iob format generation.

Figure 4

Character-level sequence labeling keyphrase extraction model architecture.
Character-level sequence labeling keyphrase extraction model architecture.

Figure 5

Input representations of character-level sequence labeling keyphrase extraction model.
Input representations of character-level sequence labeling keyphrase extraction model.

Lexicon scale comparative experiments of unsupervised approaches_

MethodPRF
Term Frequency (whole lexicon)47.66%33.36%39.24%
Term Frequency (training set lexicon)37.31%26.11%30.72%
TF*IDF (whole lexicon)54.14%37.90%44.59%
TF*IDF (training set lexicon)42.18%29.53%34.74%
TextRank (whole lexicon)43.13%30.19%35.52%
TextRank (training set lexicon)34.37%24.06%28.30%

Character-level IOB generation results on data sets_

Data SetPRFNumber of Recognized keyphrasesNumber of Correct Recognized KeyphrasesNumber of Ground-truth Keyphrases
Training Set99.18%99.42%99.3%416,013409,371408,373
Development Set99.13%99.54%99.34%25,94226,16926,061
Test Set99.15%99.56%99.36%13,34413,45813,403

Word-level IOB generation results on data sets_

Data SetPRFNumber of Recognized keyphrasesNumber of Correct Recognized KeyphrasesNumber of Ground-truth Keyphrases
Training Set91.15%96.93%93.96%395,852434,266408,373
Development Set91.35%97.03%94.11%25,28727,68026,061
Test Set90.99%97.11%93.95%13,01614,30513,403

N-value comparative experiments of unsupervised baseline approaches_

MethodTop 3 Candidate Keyphrases
Top 5 Candidate Keyphrases
PRFPRF
Term Frequency47.66%33.36%39.24%37.53%43.78%40.42%
TF*IDF54.14%37.90%44.59%40.37%47.11%43.48%
TextRank43.13%30.19%35.52%33.29%38.84%35.85%

Word-level and character-level comparative experiments of supervised machine learning baselines_

MethodWord-LevelCharacter-Level
CRF47.90%46.37%
BiLSTM44.35%38.38%
BiLSTM-CRF49.86%50.16%

Word-level and character-level comparative experiments of BERT-based models_

MetricsWord-LevelCharacter-Level
P26.88%60.33%
R54.93%59.28%
F36.10%59.80%

Performance evaluation of keyphrase extraction_

MethodPRF
TF*IDF (Baseline)54.14%37.90%44.59%
BiLSTM-CRF (Baseline)42.55%61.09%50.16%
BERT-based Model (our model)60.33%59.28%59.80%
Adjusted Model (our model)61.95%59.22%60.56%
DOI: https://doi.org/10.2478/jdis-2021-0013 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 35 - 57
Submitted on: Oct 31, 2020
Accepted on: Jan 15, 2021
Published on: Mar 2, 2021
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li, Gaihong Yu, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.