Have a personal or library account? Click to login
Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset Cover

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset

Open Access
|May 2021

References

  1. A reintroduction to our Knowledge Graph and knowledge panels. (2020). https://blog.google/products/search/about-knowledge-graph-and-knoswledge-panels/
  2. Ammar, W., Peters, M.E., Bhagavatula, C., & Power, R. (2017). The AI2 system at SemEval-2017 Task 10 (ScienceIE): Semi-supervised end-to-end entity and relation extraction. SemEval@ACL.
  3. Aryani, A., Poblet, M., Unsworth, K., Wang, J., Evans, B., Devaraju, A., Hausstein, B., Klas, C.-P., Zapilko, B., & Kaplun, S. (2018). A Research Graph dataset for connecting research data repositories using RD-Switchboard. Scientific Data, 5, 180099.
  4. Auer, S. (2018). Towards an Open Research Knowledge Graph (Version 1) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.1157185
  5. Augenstein, I., Das, M., Riedel, S., Vikraman, L., & McCallum, A. (2017). SemEval 2017 Task 10: ScienceIE—Extracting Keyphrases and Relations from Scientific Publications. SemEval@ACL.
  6. Baas, J., Schotten, M., Plume, A., Côté, G., & Karimi, R. (2020). Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quantitative Science Studies, 1(1), 377–386.
  7. Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3606–3611.
  8. Birkle, C., Pendlebury, D.A., Schnell, J., & Adams, J. (2020). Web of Science as a data source for research on scientific and scholarly activity. Quantitative Science Studies, 1(1), 363–376.
  9. Brack, A., D’Souza, J., Hoppe, A., Auer, S., & Ewerth, R. (2020). Domain-independent extraction of scientific concepts from research articles. European Conference on Information Retrieval, 251–266.
  10. Burton, A., Koers, H., Manghi, P., La Bruzzo, S., Aryani, A., Diepenbroek, M., & Schindler, U. (2017). The data-literature interlinking service: Towards a common infrastructure for sharing data-article links. Program: electronic library and information systems, 51(1), 75–100. https://doi.org/10.1108/PROG-06-2016-0048
  11. Buscaldi, D., Dessì, D., Motta, E., Osborne, F., & Reforgiato Recupero, D. (2019). Mining scholarly data for fine-grained knowledge graph construction. CEUR Workshop Proceedings, 2377, 21–30.
  12. Camacho-Collados, J., & Pilehvar, M.T. (2017). On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis. ArXiv Preprint ArXiv:1707.01780.
  13. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. ArXiv:1406.1078.
  14. Cimiano, P., Mädche, A., Staab, S., & Völker, J. (2009). Ontology learning. In Handbook on ontologies (pp. 245–267). Springer.
  15. Constantin, A., Peroni, S., Pettifer, S., Shotton, D., & Vitali, F. (2016). The document components ontology (DoCO). Semantic Web, 7(2), 167–181.
  16. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv:1810.04805.
  17. D’Souza, J., & Auer, S. (2020). NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature. In C. Zhang, P. Mayr, W. Lu, & Y. Zhang (Eds.), Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020, EEKE@JCDL 2020, Virtual Event, China, August 1st, 2020 (Vol. 2658, pp. 16–27). CEUR-WS.org. http://ceur-ws.org/Vol-2658/paper2.pdf
  18. D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., & Ewerth, R. (2020). The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources. LREC, 2192–2203.
  19. Esteves, D., Moussallem, D., Neto, C.B., Soru, T., Usbeck, R., Ackermann, M., & Lehmann, J. (2015). MEX vocabulary: A lightweight interchange format for machine learning experiments. Proceedings of the 11th International Conference on Semantic Systems, 169–176.
  20. Fisas, B., Ronzano, F., & Saggion, H. (2016). A Multi-Layered Annotated Corpus of Scientific Papers. LREC.
  21. Fricke, S. (2018). Semantic scholar. Journal of the Medical Library Association: JMLA, 106(1), 145.
  22. Ghaddar, A., & Langlais, P. (2018). Robust lexical features for improved neural network named-entity recognition. ArXiv:1806.03489.
  23. GROBID. (2008). GitHub. https://github.com/kermitt2/grobid
  24. Handschuh, S., & QasemiZadeh, B. (2014). The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. COLING 2014: 4th International Workshop on Computational Terminology.
  25. Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P. (2020). Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies, 1(1), 414–427.
  26. Huth, E.J. (1987). Structured abstracts for papers reporting clinical trials. American College of Physicians.
  27. Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., & Auer, S. (2019). Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge. KCAP, 243–246.
  28. Jiang, M., D’Souza, J., Auer, S., & Downie, J.S. (2020). Targeting Precision: A Hybrid Scientific Relation Extraction Pipeline for Improved Scholarly Knowledge Organization. Proceedings of the Association for Information Science and Technology, 57(1).
  29. Jinha, A.E. (2010). Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing, 23(3), 258–263.
  30. Johnson, R., Watkinson, A., & Mabe, M. (2018). The STM report. An Overview of Scientific and Scholarly Publishing. 5th Edition October.
  31. Kononova, O., Huo, H., He, T., Rong, Z., Botari, T., Sun, W., Tshitoyan, V., & Ceder, G. (2019). Text-mined dataset of inorganic materials synthesis recipes. Scientific Data, 6(1), 1–11.
  32. Kulkarni, C., Xu, W., Ritter, A., & Machiraju, R. (2018). An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols. NAACL: HLT, Volume 2 (Short Papers), 97–106. https://doi.org/10.18653/v1/N18-2016
  33. Kuniyoshi, F., Makino, K., Ozawa, J., & Miwa, M. (2020). Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature. LREC, 1941–1950.
  34. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. ArXiv Preprint ArXiv:1603.01360.
  35. Landhuis, E. (2016). Scientific literature: Information overload. Nature, 535(7612), 457–458.
  36. Liakata, M., Saha, S., Dobnik, S., Batchelor, C., & Rebholz-Schuhmann, D. (2012). Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics, 28(7), 991–1000.
  37. Liakata, M., Teufel, S., Siddharthan, A., & Batchelor, C.R. (2010). Corpora for the Conceptualisation and Zoning of Scientific Papers. LREC.
  38. Lin, D.K., & Pantel, P. (2002). Concept discovery from text. COLING 2002: The 19th International Conference on Computational Linguistics.
  39. Luan, Y., He, L., Ostendorf, M., & Hajishirzi, H. (2018). Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. EMNLP.
  40. Luan, Y., Ostendorf, M., & Hajishirzi, H. (2017). Scientific information extraction with semi-supervised neural tagging. ArXiv:1708.06075.
  41. Mysore, S., Jensen, Z., Kim, E., Huang, K., Chang, H.-S., Strubell, E., Flanigan, J., McCallum, A., & Olivetti, E. (2019). The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures. Proceedings of the 13th Linguistic Annotation Workshop, 56–64.
  42. Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., & Taylor, J. (2019). Industry-scale knowledge graphs: Lessons and challenges. Queue, 17(2), 48–75.
  43. Oelen, A., Jaradeh, M.Y., Farfar, K.E., Stocker, M., & Auer, S. (2019). Comparing research contributions in a scholarly knowledge graph. CEUR Workshop Proceedings 2526 (2019), 2526, 21–26.
  44. Oelen, A., Jaradeh, M.Y., Stocker, M., & Auer, S. (2020). Generate FAIR Literature Surveys with Scholarly Knowledge Graphs. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 97–106. https://doi.org/10.1145/3383583.3398520
  45. Pertsas, V., & Constantopoulos, P. (2017). Scholarly Ontology: Modelling scholarly practices. International Journal on Digital Libraries, 18(3), 173–190.
  46. Qi, P., Zhang, Y.H., Zhang, Y.H., Bolton, J., & Manning, C.D. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. https://nlp.stanford.edu/pubs/qi2020stanza.pdf
  47. Soldatova, L.N., & King, R.D. (2006). An ontology of scientific experiments. Journal of the Royal Society, Interface, 3 11, 795–803.
  48. Sollaci, L.B., & Pereira, M.G. (2004). The introduction, methods, results, and discussion (IMRAD) structure: A fifty-year survey. Journal of the Medical Library Association, 92(3), 364.
  49. Teufel, S., Carletta, J., & Moens, M. (1999). An annotation scheme for discourse-level argumentation in research articles. Proceedings of the Ninth Conference on European Chapter of ACL, 110–117.
  50. Teufel, S., Siddharthan, A., & Batchelor, C. (2009). Towards discipline-independent argumentative zoning: Evidence from chemistry and computational linguistics. EMNLP: Volume 3, 1493–1502.
  51. Vogt, L., D’Souza, J., Stocker, M., & Auer, S. (2020). Toward representing research contributions in scholarly knowledge graphs using knowledge graph cells. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 107–116.
  52. Vrandečić, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10), 78–85.
  53. Wang, B.L., Lu, W., Wang, Y., & Jin, H.X. (2018). A neural transition-based model for nested mention recognition. ArXiv:1810.01808.
  54. Wang, K.S., Shen, Z.H., Huang, C.Y., Wu, C.-H., Dong, Y.X., & Kanakia, A. (2020). Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1), 396–413.
  55. Wilkinson, M.D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., & others. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 1–9.
  56. Zhou, J., Cao, Y., Wang, X.G., Li, P., & Xu, W. (2016). Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics, 4, 371–383.
DOI: https://doi.org/10.2478/jdis-2021-0023 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 6 - 34
Submitted on: Oct 28, 2020
Accepted on: Apr 14, 2021
Published on: May 9, 2021
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 Jennifer D’Souza, Sören Auer, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.