Have a personal or library account? Click to login
Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan Cover

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Open Access
|Oct 2022

Figures & Tables

johd-8-86-g1.png
Figure 1

Pipeline for overall procedure of cross-lingual Buddhist Chinese & Classical Tibetan alignment.

Table 1

Summary of cosine similarity scores of Tibetan-Chinese glossary pairs within the new embedding spaces according to Chinese tokenisation method. Shows the highest scoring pair, lowest scoring pair, and some descriptive statistics. Higher scores with lower standard deviation indicate a more accurate embedding space.

CHINESE EMBEDDING TYPEMOST SIMILARLEAST SIMILARMEDIANMEANSTD
Character0.9–0.20.660.640.12
Hybrid10.90.190.660.650.11
Hybrid20.910.220.660.640.11
Word0.920.30.670.670.11
johd-8-86-g2.png
Figure 2

A sample of embeddings selected from the cross-lingual Tibetan-Chinese space. This includes a selection of animal, numerical, seasonal, and directional words.

johd-8-86-g3.png
Figure 3

A zoomed in detail of some of the animal words from the cross-lingual embedding space shown in Figure 1, including English translations.

johd-8-86-g4.png
Figure 4

Sample output for Alignment T2.A1.

Table 2

Results for all texts with four embedding methods for the Chinese input.

TEXT – CHI. EMBEDDING TYPE% RANK1%RANK5%RANK10%RANK15AV. RANK#ZERO
Text 1 – Character30.9569.0578.5792.864.332
Text 1 – Hybrid 135.7169.0588.192.863.560
Text 1 – Hybrid 240.4873.8190.4895.243.40
Text 1 – Word38.161.976.1985.713.922
Text 2 – Character76.191001001001.240
Text 2 – Hybrid 152.3810010010020
Text 2 – Hybrid 261.91001001001.570
Text 2 – Word42.8695.241001002.480
Text 3 – Character35.2947.0652.9470.594.581
Text 3 – Hybrid 135.2964.7182.3588.233.530
Text 3 – Hybrid 235.2958.8282.3582.353.360
Text 3 – Word11.7652.9470.5970.593.922
johd-8-86-g5.png
Figure 5

Top-ranked results for each Chinese embedding method by text.

DOI: https://doi.org/10.5334/johd.86 | Journal eISSN: 2059-481X
Language: English
Published on: Oct 4, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Rafal Felbur, Marieke Meelen, Paul Vierthaler, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.