Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Lexicon scale comparative experiments of unsupervised approaches_
| Method | P | R | F |
|---|---|---|---|
| Term Frequency (whole lexicon) | 47.66% | 33.36% | 39.24% |
| Term Frequency (training set lexicon) | 37.31% | 26.11% | 30.72% |
| TF*IDF (whole lexicon) | 54.14% | 37.90% | 44.59% |
| TF*IDF (training set lexicon) | 42.18% | 29.53% | 34.74% |
| TextRank (whole lexicon) | 43.13% | 30.19% | 35.52% |
| TextRank (training set lexicon) | 34.37% | 24.06% | 28.30% |
Character-level IOB generation results on data sets_
| Data Set | P | R | F | Number of Recognized keyphrases | Number of Correct Recognized Keyphrases | Number of Ground-truth Keyphrases |
|---|---|---|---|---|---|---|
| Training Set | 99.18% | 99.42% | 99.3% | 416,013 | 409,371 | 408,373 |
| Development Set | 99.13% | 99.54% | 99.34% | 25,942 | 26,169 | 26,061 |
| Test Set | 99.15% | 99.56% | 99.36% | 13,344 | 13,458 | 13,403 |
Word-level IOB generation results on data sets_
| Data Set | P | R | F | Number of Recognized keyphrases | Number of Correct Recognized Keyphrases | Number of Ground-truth Keyphrases |
|---|---|---|---|---|---|---|
| Training Set | 91.15% | 96.93% | 93.96% | 395,852 | 434,266 | 408,373 |
| Development Set | 91.35% | 97.03% | 94.11% | 25,287 | 27,680 | 26,061 |
| Test Set | 90.99% | 97.11% | 93.95% | 13,016 | 14,305 | 13,403 |
N-value comparative experiments of unsupervised baseline approaches_
| Method | Top 3 Candidate Keyphrases | Top 5 Candidate Keyphrases | ||||
|---|---|---|---|---|---|---|
| P | R | F | P | R | F | |
| Term Frequency | 47.66% | 33.36% | 39.24% | 37.53% | 43.78% | 40.42% |
| TF*IDF | 54.14% | 37.90% | 44.59% | 40.37% | 47.11% | 43.48% |
| TextRank | 43.13% | 30.19% | 35.52% | 33.29% | 38.84% | 35.85% |
Word-level and character-level comparative experiments of supervised machine learning baselines_
| Method | Word-Level | Character-Level |
|---|---|---|
| CRF | 47.90% | 46.37% |
| BiLSTM | 44.35% | 38.38% |
| BiLSTM-CRF | 49.86% | 50.16% |
Word-level and character-level comparative experiments of BERT-based models_
| Metrics | Word-Level | Character-Level |
|---|---|---|
| P | 26.88% | 60.33% |
| R | 54.93% | 59.28% |
| F | 36.10% | 59.80% |
Performance evaluation of keyphrase extraction_
| Method | P | R | F |
|---|---|---|---|
| TF*IDF (Baseline) | 54.14% | 37.90% | 44.59% |
| BiLSTM-CRF (Baseline) | 42.55% | 61.09% | 50.16% |
| BERT-based Model (our model) | 60.33% | 59.28% | 59.80% |
| Adjusted Model (our model) | 61.95% | 59.22% | 60.56% |