Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Intra-Annotation Evaluation Results_ The NlpContributionGraph scheme pilot stage annotations evaluated against the adjudicated gold-standard annotations made on the trial dataset_
| Tasks | Information Units | Sentences | Phrases | Triples | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | ||
| 1 | MT | 66.66 | 73.68 | 70.0 | 66.67 | 54.55 | 60.0 | 37.47 | 30.96 | 33.91 | 19.73 | 17.46 | 18.53 |
| 2 | NER | 79.55 | 81.40 | 80.46 | 60.89 | 69.43 | 64.88 | 44.09 | 42.60 | 43.34 | 22.34 | 21.63 | 21.98 |
| 3 | QA | 93.18 | 93.18 | 93.18 | 67.96 | 79.55 | 73.30 | 54.04 | 45.21 | 49.23 | 37.50 | 32.0 | 34.52 |
| 4 | RC | 70.21 | 73.33 | 71.74 | 64.64 | 60.31 | 62.40 | 35.31 | 29.24 | 32.0 | 12.59 | 11.45 | 11.99 |
| 5 | TC | 86.67 | 84.78 | 85.71 | 75.44 | 78.66 | 77.01 | 54.77 | 45.38 | 49.63 | 27.41 | 22.41 | 24.66 |
| Cum. | micro | 78.83 | 80.65 | 79.73 | 67.25 | 67.63 | 67.44 | 45.36 | 38.83 | 41.84 | 23.76 | 20.97 | 22.28 |
| macro | 78.8 | 80.49 | 79.64 | 67.33 | 68.51 | 67.92 | 45.2 | 38.91 | 41.82 | 23.87 | 20.95 | 22.31 | |
Annotated corpus statistics for the 12 Information Units in the NlpContributionGraph scheme_
| Information Unit | No. of triples | No. of papers | Ratio of triples to papers |
|---|---|---|---|
| E | 168 | 3 | 56 |
| T | 277 | 8 | 34.63 |
| E | 300 | 16 | 18.75 |
| M | 561 | 32 | 17.53 |
| H | 254 | 15 | 16.93 |
| R | 688 | 42 | 16.38 |
| A | 283 | 18 | 15.72 |
| B | 148 | 10 | 14.8 |
| A | 155 | 13 | 11.92 |
| D | 8 | 1 | 8 |
| R | 169 | 50 | 3.38 |
| C | 9 | 9 | 1 |
Two examples illustrating the three different granularities for NlpContributionGraph data instances (viz_, a_ sentences, b_ phrases, and c_ triples) modeled for the Result information unit from a scholarly article (Cho et al_, 2014)_
|
|
Annotated corpus characteristics for our trial dataset containing a total of 50 NLP articles using the NlpContributionGraph model_ “ann” stands for annotated; and IU for information unit_ The 50 articles are uniformly distributed across five different NLP subfields characterized at sentence and token-level granularities as follows—machine translation (MT): 2,596 total sentences, 9,581 total overall tokens; named entity recognition (NER): 2,295 sentences, 8,703 overall tokens; question answering (QA): 2,511 sentences, 10,305 overall tokens; relation classification (RC): 1,937 sentences, 10,020 overall tokens; text classification (TC): 2,071 sentences, 8,345 overall tokens_
| MT | NER | QA | RC | TC | Overall | |
|---|---|---|---|---|---|---|
| total IUs | 38 | 43 | 44 | 45 | 46 | 216 |
| ann Sentences | 209 | 157 | 176 | 194 | 164 | 900 |
| avg ann Sentences | 0.081 | 0.068 | 0.07 | 0.1 | 0.079 | - |
| ann Phrases | 956 | 770 | 960 | 978 | 1038 | 4,702 |
| avg Toks per Phrase | 2.81 | 2.87 | 2.76 | 2.91 | 2.7 | - |
| avg ann Phrase Toks | 0.28 | 0.25 | 0.26 | 0.28 | 0.34 | - |
| ann Triples | 590 | 504 | 619 | 620 | 647 | 2,980 |