References
- 1Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Minneapolis, Minnesota:
Association for Computational Linguistics . Retrieved fromhttps://aclanthology.org/N19-1423 (last accessed: 8 August 2022) DOI: 10.18653/v1/N19-1423 - 2Faggionato, C., Hill, N., & Meelen, M. (2022, June). NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties. In Proceedings of The Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference (p. 1–6). Marseille, France:
European Language Resources Association . - 3Faggionato, C., & Meelen, M. (2019).
Developing the Old Tibetan treebank . In N. T. Angelova Mitkov (Ed.), Proceedings of Recent Advances in Natural Language Processing (p. 304–312). Varna: Incoma. DOI: 10.26615/978-954-452-056-4_035 - 4Glavaš, G., Franco-Salvador, M., Ponzetto, S. P., & Rosso, P. (2018). A resource-light method for cross-lingual semantic textual similarity. Knowledge-based systems, 143, 1–9. DOI: 10.1016/j.knosys.2017.11.041
- 5Handy, C., & Meelen, M. (2022, June). MRK alignment scoring guidelines. Zenodo. Retrieved from
https://doi.org/10.5281/zenodo.6782150 (last accessed: 8 August 2022). - 6Inagaki, H. (1978). Index to the Larger Sukhāvatīvyūha-sūtra. A Tibetan Glossary with Sanskrit and Tibetan Equivalents. Tokyo: Nagata Bunshudo.
- 7Karashima, S. (1998). A Glossary of Dharmarakṣa’s Translation of the Lotus Sutra: Zheng fahua jing ci dian. Tokyo: The International Research Institute for Advanced Buddhology, Soka University.
- 8Karashima, S. (2001). A Glossary of Kumārajīva’s Translation of the Lotus Sutra: Myōhō Rengekyō shiten. Tokyo: The International Research Institute for Advanced Buddhology, Soka University.
- 9Karashima, S. (2010). A Glossary of Lokakṣema’s Translation of the Aṣṭasāhasrikā Prajñāpāramitā. Tokyo: The International Research Institute for Advanced Buddhology, Soka University.
- 10Klein, B. E., Dershowitz, N., Wolf, L., Almogi, O., & Wangchuk, D. (2014). Finding Inexact Quotations Within a Tibetan Buddhist Corpus. In 9th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2014, Lausanne, Switzerland,
8–12 July 2014 , Conference Abstracts. - 11Li, Q. (2011). Da zhidu lun cidian 大智度論辭典. Electronic resource. Retrieved from
https://www.dropbox.com/s/ocsagb529k3e70v/dzdl.bgl?dl=0 (last accessed: 1 June 2021). - 12Meelen, M. (2022). Tibetan language models: from distributional semantics to facilitating Tibetan NLP. Accepted submission to IATS 2022.
- 13Meelen, M., & Hill, N. (2017). Segmenting and POS tagging Classical Tibetan using a memory-based tagger. Himalayan Linguistics, 16(2). DOI: 10.5070/H916234501
- 14Meelen, M., & Roux, É. (2020). Meta-dating the parsed corpus of Tibetan (PACTib). In Proceedings of the 19th Workshop on Treebanks and Linguistic Theories (pp. 31–42). DOI: 10.18653/v1/2020.tlt-1.3
- 15Meelen, M., Roux, É., & Hill, N. (2021). Optimisation of the largest annotated Tibetan corpus combining rule-based, memory-based, and deep-learning methods. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 20(1), 1–11. DOI: 10.1145/3409488
- 16Mikolov, T., Le, Q. V., & Sutskever, I. (2013). Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168. Retrieved from
http://arxiv.org/abs/1309.4168 (last accessed: 8 August 2022). - 17Nehrdich, S. (2020). A method for the calculation of parallel passages for Buddhist Chinese sources based on million-scale nearest neighbor search. Journal of the Japanese Association for Digital Humanities, 5(2), 132–153. DOI: 10.17928/jjadh.5.2_132
- 18Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082. DOI: 10.18653/v1/2020.acl-demos.14
- 19Silk, J. A. (2020). Tekisuto sokei no nai kōtei: Bukkyō kyōten to yudayakyō rabi bunken kenkyū ni okeru honbun hihan, soshite ‘Hirakareta bunkengaku’ dejitaru hyūmanitīzu purojekuto” テキスト祖型のない校訂: 佛敎經典とユダヤ敎ラビ文獻硏究における本文批評、そして「開かれた文獻學」デジタルヒューマニティーズプロジェクト[Editing without an Ur-text: Buddhist Sūtras, Rabbinic Text Criticism, and the Open Philology Digital Humanities Project]. Tōyō no Shisō to Shūkyō 東洋の思想と宗敎, 37, 22–58.
- 20Vierthaler, P. (2020). A Simple Dictionary-Based Tokenizer for Classical Chinese Text. Retrieved from
https://github.com/vierth/dictionary_parser (last accessed: 8 August 2022). - 21Vierthaler, P. (2022, June). Buddhist Chinese Word Embeddings. Zenodo. Retrieved from
https://doi.org/10.5281/zenodo.6782932 (last accessed: 8 August 2022). - 22Vierthaler, P., & Gelein, M. (2019, 3 22). A blast-based, language-agnostic text reuse algorithm with a markus implementation and sequence alignment optimized for large Chinese corpora. Journal of Cultural Analytics, 4(2). DOI: 10.22148/16.034
- 23Wang, Y.-C. (2020). Word segmentation for Classical Chinese Buddhist literature. Journal of the Japanese Association for Digital Humanities, 5(2), 154–172. DOI: 10.17928/jjadh.5.2_154
- 24Wittern, C. (2016). The Kanseki repository: A new online resource for Chinese textual studies. Digital Scholarship in History and the Humanities.
- 25Xing, C., Wang, D., Liu, C., & Lin, Y. (2015, May–June). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 1006–1011). Denver, Colorado:
Association for Computational Linguistics . Retrieved fromhttps://aclanthology.org/N15-1104 (last accessed: 8 August 2022). - 26Yokoyama, K., & Hirosawa, T. (1996). Index to the Yogācārabhūmi, Chinese-Sanskrit-Tibetan: 漢梵蔵対照瑜伽師地論総索引. Tokyo: Sankibō Busshorin.
