Have a personal or library account? Click to login
Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan Cover

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Open Access
|Oct 2022

References

  1. 1Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 41714186). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1423 (last accessed: 8 August 2022) DOI: 10.18653/v1/N19-1423
  2. 2Faggionato, C., Hill, N., & Meelen, M. (2022, June). NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties. In Proceedings of The Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference (p. 16). Marseille, France: European Language Resources Association.
  3. 3Faggionato, C., & Meelen, M. (2019). Developing the Old Tibetan treebank. In N. T. Angelova Mitkov (Ed.), Proceedings of Recent Advances in Natural Language Processing (p. 304312). Varna: Incoma. DOI: 10.26615/978-954-452-056-4_035
  4. 4Glavaš, G., Franco-Salvador, M., Ponzetto, S. P., & Rosso, P. (2018). A resource-light method for cross-lingual semantic textual similarity. Knowledge-based systems, 143, 19. DOI: 10.1016/j.knosys.2017.11.041
  5. 5Handy, C., & Meelen, M. (2022, June). MRK alignment scoring guidelines. Zenodo. Retrieved from https://doi.org/10.5281/zenodo.6782150 (last accessed: 8 August 2022).
  6. 6Inagaki, H. (1978). Index to the Larger Sukhāvatīvyūha-sūtra. A Tibetan Glossary with Sanskrit and Tibetan Equivalents. Tokyo: Nagata Bunshudo.
  7. 7Karashima, S. (1998). A Glossary of Dharmarakṣa’s Translation of the Lotus Sutra: Zheng fahua jing ci dian. Tokyo: The International Research Institute for Advanced Buddhology, Soka University.
  8. 8Karashima, S. (2001). A Glossary of Kumārajīva’s Translation of the Lotus Sutra: Myōhō Rengekyō shiten. Tokyo: The International Research Institute for Advanced Buddhology, Soka University.
  9. 9Karashima, S. (2010). A Glossary of Lokakṣema’s Translation of the Aṣṭasāhasrikā Prajñāpāramitā. Tokyo: The International Research Institute for Advanced Buddhology, Soka University.
  10. 10Klein, B. E., Dershowitz, N., Wolf, L., Almogi, O., & Wangchuk, D. (2014). Finding Inexact Quotations Within a Tibetan Buddhist Corpus. In 9th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2014, Lausanne, Switzerland, 8–12 July 2014, Conference Abstracts.
  11. 11Li, Q. (2011). Da zhidu lun cidian 大智度論辭典. Electronic resource. Retrieved from https://www.dropbox.com/s/ocsagb529k3e70v/dzdl.bgl?dl=0 (last accessed: 1 June 2021).
  12. 12Meelen, M. (2022). Tibetan language models: from distributional semantics to facilitating Tibetan NLP. Accepted submission to IATS 2022.
  13. 13Meelen, M., & Hill, N. (2017). Segmenting and POS tagging Classical Tibetan using a memory-based tagger. Himalayan Linguistics, 16(2). DOI: 10.5070/H916234501
  14. 14Meelen, M., & Roux, É. (2020). Meta-dating the parsed corpus of Tibetan (PACTib). In Proceedings of the 19th Workshop on Treebanks and Linguistic Theories (pp. 3142). DOI: 10.18653/v1/2020.tlt-1.3
  15. 15Meelen, M., Roux, É., & Hill, N. (2021). Optimisation of the largest annotated Tibetan corpus combining rule-based, memory-based, and deep-learning methods. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 20(1), 111. DOI: 10.1145/3409488
  16. 16Mikolov, T., Le, Q. V., & Sutskever, I. (2013). Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168. Retrieved from http://arxiv.org/abs/1309.4168 (last accessed: 8 August 2022).
  17. 17Nehrdich, S. (2020). A method for the calculation of parallel passages for Buddhist Chinese sources based on million-scale nearest neighbor search. Journal of the Japanese Association for Digital Humanities, 5(2), 132153. DOI: 10.17928/jjadh.5.2_132
  18. 18Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082. DOI: 10.18653/v1/2020.acl-demos.14
  19. 19Silk, J. A. (2020). Tekisuto sokei no nai kōtei: Bukkyō kyōten to yudayakyō rabi bunken kenkyū ni okeru honbun hihan, soshite ‘Hirakareta bunkengaku’ dejitaru hyūmanitīzu purojekuto” テキスト祖型のない校訂: 佛敎經典とユダヤ敎ラビ文獻硏究における本文批評、そして「開かれた文獻學」デジタルヒューマニティーズプロジェクト[Editing without an Ur-text: Buddhist Sūtras, Rabbinic Text Criticism, and the Open Philology Digital Humanities Project]. Tōyō no Shisō to Shūkyō 東洋の思想と宗敎, 37, 2258.
  20. 20Vierthaler, P. (2020). A Simple Dictionary-Based Tokenizer for Classical Chinese Text. Retrieved from https://github.com/vierth/dictionary_parser (last accessed: 8 August 2022).
  21. 21Vierthaler, P. (2022, June). Buddhist Chinese Word Embeddings. Zenodo. Retrieved from https://doi.org/10.5281/zenodo.6782932 (last accessed: 8 August 2022).
  22. 22Vierthaler, P., & Gelein, M. (2019, 3 22). A blast-based, language-agnostic text reuse algorithm with a markus implementation and sequence alignment optimized for large Chinese corpora. Journal of Cultural Analytics, 4(2). DOI: 10.22148/16.034
  23. 23Wang, Y.-C. (2020). Word segmentation for Classical Chinese Buddhist literature. Journal of the Japanese Association for Digital Humanities, 5(2), 154172. DOI: 10.17928/jjadh.5.2_154
  24. 24Wittern, C. (2016). The Kanseki repository: A new online resource for Chinese textual studies. Digital Scholarship in History and the Humanities.
  25. 25Xing, C., Wang, D., Liu, C., & Lin, Y. (2015, May–June). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 10061011). Denver, Colorado: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N15-1104 (last accessed: 8 August 2022).
  26. 26Yokoyama, K., & Hirosawa, T. (1996). Index to the Yogācārabhūmi, Chinese-Sanskrit-Tibetan: 漢梵蔵対照瑜伽師地論総索引. Tokyo: Sankibō Busshorin.
DOI: https://doi.org/10.5334/johd.86 | Journal eISSN: 2059-481X
Language: English
Published on: Oct 4, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Rafal Felbur, Marieke Meelen, Paul Vierthaler, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.