An HMM-Based PoS Tagger for Old Church Slavonic

Olga Lyashevskaya; Ilia Afanasev

doi:10.2478/jazcas-2021-0051

.blurhash-client-img { display: none !important; }

An HMM-Based PoS Tagger for Old Church Slavonic

Journal of Linguistics/Jazykovedný casopis

Volume 72 (2021): Issue 2 (December 2021)

By: Olga Lyashevskaya and Ilia Afanasev

Open Access

|Dec 2021

[1] Behera, P. (2017). An Experiment with the CRF++ Parts of Speech (POS) Tagger for Odia Language in India, 17(1), pages 18–40.
Search in Google Scholar Back to article
[2] Loftsson, H. (2008). Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics, 31(1), pages 47–72.10.1017/S0332586508001820
Search in Google Scholar Back to article
[3] Dandapat, S., Sarkar, S., and Basu, A. (2007). Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario. In Proceedings of the 45^th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 221–224, Association for Computational Linguistics.10.3115/1557769.1557833
Search in Google Scholar Back to article
[4] Rajendran, S., and Krishnakumar. K. (2019). A Comprehensive Study of Shallow Parsing and Machine Translation in Malaylam. Coimbatore: Amrita Vishwa Vidyapeetham, 295 p.
Search in Google Scholar Back to article
[5] Uludoğan, G. (2018). HMM POS tagger. Accessible at: https://github.com/gokceuludogan/hmm-pos-tagger.
Search in Google Scholar Back to article
[6] Jurish, B. (2003). A Hybrid Approach to Part-of-Speech Tagging. Berlin: Berlin-Brandenburgishe Akademie der Wissenschaften, 2003, 27 p.
Search in Google Scholar Back to article
[7] (UD) UPOS tag set. Accessible at: https://universaldependencies.org/u/pos/.
Search in Google Scholar Back to article
[8] Mohamed Elhadj, Y. O. (2009). Statistical Part-of-Speech Tagger for Traditional Arabic Texts. Journal of Computer Science, 5(11), pages 794–800.
Search in Google Scholar Back to article
[9] Danso, S., and Lamb, W. (2014). Developing an Automatic Part-of-Speech Tagger for Scottish Gaelic. In Proceedings of the First Celtic Language Technology Workshop, pages 1–5, ACL.
Search in Google Scholar Back to article
[10] Mirzanezhad, Z., and Feizi-Derakhshi, M.-R. (2016). Using morphological analyzer to statistical POS Tagging on Persian Text. IJCSIS, 14(8), pages 1093–1103.
Search in Google Scholar Back to article
[11] Abumalloh, R. A., Al-Sarhan, H. M., Ibrahim, O. B., and Abu-Ulbeh, W. (2016). Arabic Part-of-Speech Tagging. Journal of Soft Computing and Decision Support Systems, 3(2), pages 45–52.
Search in Google Scholar Back to article
[12] Kumar, S. S., Kumar, M. A., and Soman, K. P. (2016). Experimental analysis of Malayalam PoS tagger using EPIC framework in Scala. ARPN Journal of Engineering and Applied Sciences, 11(13), pages 8017–8023.
Search in Google Scholar Back to article
[13] Gambäck, B., Olsson, F., Argaw, A. A., and Asker, L. (2009). Methods for Amharic Part-of-Speech Tagging. In Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages, pages 104–111, ACL.10.3115/1564508.1564527
Search in Google Scholar Back to article
[14] Saharia, N., Das, D., Sharma, S., and Kalita, J. (2009). Part of Speech Tagger for Assamese Text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 33–36, World Scientific Publishing Co Pte Ltd.10.3115/1667583.1667595
Search in Google Scholar Back to article
[15] Reddy, S., and Sharoff, S. (2011). Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources. In Proceedings of the 5^th International Joint Conference on Natural Language Processing, pages 11–19, Asian Federation of Natural Language Processing.
Search in Google Scholar Back to article
[16] Kann, K. et al. (2018). Character-level supervision for low-resource POS tagging. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 1–11, Association for Computational Linguistics.10.18653/v1/W18-3401
Search in Google Scholar Back to article
[17] Straka, M., Strakova, J., and Hajič, J. (2019). Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER. In Proceedings of 22^nd International Conference “Text, Speech and Dialogue” 2019, pages 137–150, TSD.10.1007/978-3-030-27947-9_12
Search in Google Scholar Back to article
[18] Hajič, J., and Hladká, B. (1998). Czech language processing, POS tagging. Accessible at: https://ufal.mff.cuni.cz/czech-tagging/HajicHladkaLREC1998.pdf.
Search in Google Scholar Back to article
[19] Korobov, M. (2015). Morphological Analyzer and Generator for Russian and Ukrainian Languages. Analysis of Images, Social Networks and Texts, pages 320–332.10.1007/978-3-319-26123-2_31
Search in Google Scholar Back to article
[20] Bird, S., Loper, E., and Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media Inc, 479 p.
Search in Google Scholar Back to article
[21] Segalovich, I. (2003). A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine. In Proceedings of the International Conference on Machine Learning; Models, Technologies and Applications, pages 273–280, MLMTA’03, June 23–26, 2003, Las Vegas, Nevada, USA.
Search in Google Scholar Back to article
[22] Eckhoff, H. M., and Berdicevskis, A. (2015). Linguistics vs. digital editions: The Tromsø Old Russian and OCS Treebank. Scripta & e-Scripta 2015, 14(15), pages 9–25.
Search in Google Scholar Back to article
[23] Pedrazzini, N., and Eckhoff, H. M. (2021). OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data. Software Impacts, 8. Accessible at: https://www.sciencedirect.com/science/article/pii/S2665963821000117.
Search in Google Scholar Back to article
[24] TITUS. Accessible at: http://titus.uni-frankfurt.de/indexe.htm.
Search in Google Scholar Back to article
[25] Manuscript. Accessible at: http://manuscripts.ru/.
Search in Google Scholar Back to article
[26] Zeman, D., Nivre, J., Abrams, M. et al. (2020). Universal Dependencies 2.7. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. Accessible at: http://hdl.handle.net/11234/1-3424.
Search in Google Scholar Back to article
[27] Haug, D. T. T., and Jøhndal, M. L. (2008). Creating a Parallel Treebank of the Old Indo-European Bible Translations. In Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008), pages 27–34, ACM, New York, NY.
Search in Google Scholar Back to article
[28] Straka, M., Hajič, J., and Straková, J. (2016). UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. Accessible at: https://ufal.mff.cuni.cz/~straka/papers/2016-lrec_udpipe.pdf.
Search in Google Scholar Back to article
[29] Straka, M., Straková, J., and Hajič, J. (2019). Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing. In Arxiv.org Computing Research Repository, ISSN 2331-8422, 1904.02099.
Search in Google Scholar Back to article
[30] Kamphuis, J. (2020). Verbal Aspect in Old Church Slavonic: A Corpus-based Approach. Leiden: Brill, 329 p.10.1163/9789004422032
Search in Google Scholar Back to article
[31] Strobl, C., Malley, J., and Tutz, G. (2009). An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests. Psychological Methods, 14(4), pages 323–348.10.1037/a0016973
Search in Google Scholar Back to article
[32] Kiev Folia. Accessible at: http://www.schaeken.nl/lu/research/online/editions/kievfol.html.
Search in Google Scholar Back to article
[33] Afanasev, I. (2020). Korpus staroslavianskogo iazyka: nedostaiushchee zveno v diakhronicheskoi slavistike. In Slavica iuvenum xxI: sbornik trudov mezhdunarodnoi nauchnoi konferentsii Slavica iuvenum 2020, March 31–April 1, 2020, pages 13–21, Ostravskii universitet, Ostrava.
Search in Google Scholar Back to article
[34] Project GitHub Repository. Accessible at: https://github.com/The-One-Who-Speaks-and-Depicts/hmm-pos-tagger.
Search in Google Scholar Back to article
[35] Helmut, S. (1994). Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of International Conference on New Methods in Language Processing, pages 1–9, Manchester, UK. Accessible at: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger2.pdf.
Search in Google Scholar Back to article
[36] Simov, K., Osenova, P., and Slavcheva, M. (2004). BTB-TR03: BulTreeBank Morphosyntactic Tagset. Accessible at: http://bultreebank.org/wp-content/uploads/2017/06/BTB-TR03.pdf.
Search in Google Scholar Back to article
[37] Schmid, H., and Laws, F. (2008). Estimation of Conditional Probabilities with Decision Trees and an Application to Fine-Grained POS Tagging. Accessible at: https://cis.lmu.de/~schmid/papers/Schmid-Laws.pdf.
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/jazcas-2021-0051 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597

Journal RSS Feed

Language: English

Page range: 556 - 567

Published on: Dec 30, 2021

Published by: Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics

In partnership with: Paradigm Publishing Services

Publication frequency: 3 issues per year

Keywords:

Universal Dependencies

Related subjects:

Linguistics and semiotics,

Theoretical frameworks and disciplines,

Linguistics, other

© 2021 Olga Lyashevskaya, Ilia Afanasev, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Volume 72 (2021): Issue 2 (December 2021)