Have a personal or library account? Click to login
Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora Cover

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Open Access
|Dec 2021

References

  1. [1] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Workshop Track Proceedings of 1st International Conference on Learning Representations, Scottsdale, Arizona, USA, May 2013, pp. 1–12.
  2. [2] P. Jeffrey, S. Richard, and D. M. Christopher, “GloVe: Global vectors for word representation,” in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543.
  3. [3] B. Wang, A. Wang, F. Chen, Y. Wang, and C.-C. J. Kuo, “Evaluating word embedding models: methods and experimental results,”APSIPA Transactions on Signal and Information Processing, vol. 8, Art no. e19, Jul. 2019. https://doi.org/10.1017/ATSIP.2019.1210.1017/ATSIP.2019.12
  4. [4] A. Znotiņš, “Word embeddings for Latvian natural language processing tools,”Human Language Technologies – The Baltic Perspective, vol. 289, IOS Press, pp. 167–173, 2016. https://doi.org/10.3233/978-1-61499-701-6-167
  5. [5] A. Znotiņš and G. Barzdiņš, “LVBERT: Transformer-based model for Latvian language understanding,” Human Language Technologies – The Baltic Perspective, vol. 328, IOS Press, pp. 111–115, 2020. https://doi.org/10.3233/FAIA20061010.3233/FAIA200610
  6. [6] R. Vīksna and I. Skadiņa, “Large language models for Latvian named entity recognition,” Human Language Technologies – The Baltic Perspective, vol. 328,IOS Press, pp. 62–69, 2020. https://doi.org/10.3233/FAIA20060310.3233/FAIA200603
  7. [7] “EuroParl,” [Online]. Available: https://www.statmt.org/europarl/. Accessed on: May 2021.
  8. [8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding”, in Proceedings of NAACL-HLT, Minneapolis, Minnesota, 2019, p. 4171–4186.
  9. [9] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv, Art no.1802.05365, pp. 1–15, 2018.
  10. [10] M. Ulčar, A. Žagar, C. Armendariz, A. Repar, S. Pollak, M. Purver, and M. Robnik-Šikonja, “Evaluation of contextual embeddings on less-resourced languages,” arXiv, Art no. 2107.10614, pp. 1–45, 2021.
  11. [11] X. Rong, “word2vec parameter learning explained”, arXiv, Art no. 1411.2738v4, pp. 1–21, 2016.
  12. [12] W. Ling, C. Dyer, A. Black, and I. Trancoso, “Two/Too simple adaptations of Word2Vec for syntax problems”, in Proceedings of the 2015 Conference of the North American, Denver, Colorado, May-June 2015, pp. 1299–1304. https://doi.org/10.3115/v1/N15-114210.3115/v1/N15-1142
  13. [13] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information”, Transactions of the Association for Computational Linguistics, vol. 5, June 2017, pp. 135–146. https://doi.org/10.1162/tacl_a_0005110.1162/tacl_a_00051
  14. [14] Z. Zhao, T. Liu, S. Li, and B. Li, “Ngram2vec: Learning improved word representations from Ngram co-occurrence statistics”, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, Sep. 2017, pp. 244–253. https://doi.org/10.18653/v1/D17-102310.18653/v1/D17-1023
  15. [15] “UDPipe,” [Online]. Available: https://ufal.mff.cuni.cz/udpipe/1. Accessed on: May 2021.
  16. [16] “Gensim library,” [Online]. Available: https://radimrehurek.com/gensim/. Accessed on: May 2021.
  17. [17] “Ngram2vec tool repository,” [Online]. Available: https://github.com/zhezhaoa/ngram2vec. Accessed on: May 2021.
  18. [18] “Structured Skip-Gram tool repository,” [Online]. Available: https://github.com/wlin12/wang2vec. Accessed on: May 2021.
  19. [19] M. Ulčar, K. Vaik, J. Lindstrom, M. Dailidenaite, and M. Robnik-Sikonja, “Multilingual culture-independent word analogy datasets,” in Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 2020, pp. 4074–4080.
  20. [20] “Translated analogy dataset repository,” [Online]. Available: https://www.clarin.si/repository/xmlui/handle/11356/1261. Accessed on: May 2021.
  21. [21] “SpaCy tool,” [Online]. Available: https://spacy.io/. Accessed on: May 2021.
  22. [22] “LVTB dataset repository.” [Online]. Available: https://github.com/UniversalDependencies/UD_Latvian-LVTB/tree/master. Accessed on: May 2021.
  23. [23] “LUMII_AiLab NER dataset repository.” [Online]. Available: https://github.com/LUMII-AILab/FullStack/tree/master/NamedEntities. Accessed on: May 2021.
  24. [24] O. Levy and Y. Goldberg, “Linguistic Regularities in Sparse and Explicit Word Representations”, in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, June 2014, pp. 171–180. https://doi.org/10.3115/v1/W14-161810.3115/v1/W14-1618
  25. [25] “CommonCrawl,” [Online]. Available: https://commoncrawl.org/. Accessed on: May 2021.
  26. [26] P. Paikens, “Deep neural learning approaches for Latvian morphological tagging,” Human Language Technologies – The Baltic Perspective, vol. 289,IOS Press, pp. 160–166, 2016.https://doi.org/10.3233/978-1-61499-701-6-160
DOI: https://doi.org/10.2478/acss-2021-0016 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 132 - 138
Published on: Dec 30, 2021
Published by: Riga Technical University
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Rolands Laucis, Gints Jēkabsons, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.