Have a personal or library account? Click to login
Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan Cover

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Open Access
|Oct 2022

Abstract

In this paper we present the first-ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist sūtra literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross-lingual embedding space by taking the cosine similarity of average sequence vectors in order to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and even paragraph level. Initial results show that our method lays a solid foundation for the future development of a fully-fledged Information Retrieval tool for these (and potentially other) low-resource historical languages.

DOI: https://doi.org/10.5334/johd.86 | Journal eISSN: 2059-481X
Language: English
Published on: Oct 4, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Rafal Felbur, Marieke Meelen, Paul Vierthaler, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.